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NETWORK MONTTORING 
Background of th e Invention 
The invention relates to monitoring and managing 
communication networks for computers. 

Todays computer networks are large complex systems 
with many components from a large variety of vendors. 
These networks often span large geographic areas ranging 
from a campus-like setting to world wide networks. While 
the network itself can be used by many different types of 
organizations, the purpose of these networks is to move 
information. between computers. Typical applications are 
electronic mail, transaction processing, remote database, 
query, and simple file transfer. Usually, the 
organization that has installed and is running the 
network needs the network to be running properly in order 

s. Since these networks are 
complex systems, there are various controls provided by 
the different equipment to control and manage the 
network. Network management is the task of planning, 
engineering, securing and operating a network. 

To manage the network properly, the Network 
Manager has some obvious needs. First, the Network 
Manager must trouble shoot problems. As the errors 
develop in a running network, the Network Manager must 
have some tools that notify him of the errors and allow 
him to diagnose and repair these errors. Second, the 
Network Manager needs to configure the network in such a 
manner that the network loading characteristics provide 
the best service possible for the network users. To do 
this the Network Manager must have tools that allow him 
visibility into access patterns, bottlenecks and general 
loading. With such data, the Network Manager can 
reconfigure the network components for better service. 

There are many different components that need to 
be managed in the network. These elements can be, but 
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are not limited to: routers, bridges, PC's, workstations, 
minicomputers, supercomputers, printers, file servers, 
switches and pbx's. Each component provides a protocol 
for reading and writing the management variables in the 
5 machine. These variables are usually defined by the 
component vendor and are usually referred to as a 
Management information Base (MIB) . There are some 
standard MIB's, such as the IETF (Internet Engineering 
Task Force) MIB I and MIB II standard definitions. 

10 Through the reading and writing of MIB variables, 

software in other computers can manage or control the 
component. The software in the component that provides 
remote access to the MIB variables is usually called an 
agent. Thus, an individual charged with the 

15 responsibility of managing a large network often will use 
various tools to manipulate the MIB's of various agents 

on the network. 

Unfortunately, the standards for accessing MIBs 
are not yet uniformly provided nor are the MIB 

20 definitions complete enough to manage an entire network. 
The Network Manager must therefore use several different 
types of computers to access the agents in the network. 
This poses a problem, since the errors occurring on the 
network will tend to show up in different computers and 

25 the Network Manager must therefore monitor several 

different screens to determine if the network is running 
properly. Even when the Network Manager is able to 
accomplish this task, the tools available are not 
sufficient for the Network Manager to function properly. 

30 Furthermore, there are many errors and loadings on 

the network that are not reported by agents. Flow 
control problems, retransmissions, on-off segment 
loading, network capacities and utilizations are some of 
the types of data that are not provided by the agents. 
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Simple needs like charging each user for actual network 
usage are impossible. 

Summary nf the Invp^ Hn^ 
In general, in one aspect, the invention features 
5 monitoring communications which occur in a network of 
nodes, each communication being effected by a 
transmission of one or more packets among two or more 
communicating nodes, each communication complying with a 
predefined communication protocol selected from among 
10 protocols available in the network. The contents of 
packets are detected passively and in real time, 
communication information associated with multiple 
protocols is derived from the packet contents. 

Preferred embodiments of the invention include the 
15 following features. The communication information 
derived from the, packet contents is associated with 
multiple layers of at least one of the protocols. 

In general, in another aspect, the invention 
features monitoring communication dialogs which occur in 
20 a network of nodes, each dialog being effected by a 
transmission of one or more packets among two or more 
communicating nodes, each dialog complying with a 
predefined communication protocol selected from among 
protocols available in the network. Information about 
25 the states of dialogs occurring in the network and which 
comply with different selected protocols available in the 
network is derived from the packet contents. 

Preferred embodiments of the invention include the 
following features. A current state is maintained for 
30 each dialog, and the current state is updated in response 
to the detected contents of transmitted packets. For 
each dialog, a history of events is maintained based on 
information derived from the contents of packets, and the 
history of events is analyzed to derive information about 
35 the dialog. The analysis of the history includes 
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counting events and gathering statistics about events. 
The history is monitored for dialogs which are inactive, 
and dialogs which have been inactive for a predetermined 
period of time are purged. For example, the current 
state is updated to data state in response to observing 
the transmission of at least two data related packets 
from each node. Sequence numbers of data related packets 
stored in the history of events are analyzed and 
retransmissions are detected based on the sequence 
numbers. The the current state is updated based on each 
new packet associated with the dialog; if an updated 
current state cannot be determined, information about 
prior packets associated with the dialog is consulted as 
an aid in updating the state. The history of events may 
15 be searched to identify the initiator of a dialog. 

The full set of packets associated with a dialog 
up to a point in time completely define a true state of 
the dialog at that point in time, and the step of 
updating the current state in response to the detected 
20 contents of transmitted packets includes generating a 
current state (e.g., "unknown") which may not conform to 
the true state. The current state may be updated to the 
true state based on information about prior packets 
transmitted in the dialog. 
25 Each communication may involve multiple dialogs 

corresponding to a specific protocol. Each protocol 
layer of the communication may be parsed and analyzed to 
isolate each dialog and statistics may be kept for each 
dialog. The protocols may include a connectionless-type 
30 protocol in which the state of a dialog is implicit in 
transmitted packets, and the step of deriving information 
about the states of dialogs includes inferring the states 
of the dialogs from the packets. Keeping statistics for 
protocol layers may be temporarily suspended when parsing 
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and statistics gathering is not rapid enough to match the 
rate of packets to be parsed. 

In general, in another aspect, the invention 
features monitoring the operation of the network with 
5 respect to specific items of performance during normal 
operation, generating a model of the network based on the 
monitoring, and setting acceptable threshold levels for 
the specific items of performance based on the model. In 
preferred embodiments, the operation of the network is 
10 monitored with respect to the specific items of 

performance during periods which may include abnormal 
operation. 

In general, in another aspect, the invention 
features the combination of a monitor connected to the 

15 network medium for passively, and in real time, 

monitoring transmitted packets and storing information 
about dialogs associated with the packets, and a 
workstation for receiving the information about dialogs 
from the monitor and providing an interface to a user. In 

20 preferred embodiments, the workstation includes means for 
enabling a user to observe events of active dialogs. 

In general, in another aspect, the invention 
features apparatus for monitoring packet communications 
in a network of nodes in which communications may be in 

25 accordance with multiple protocols. The apparatus 

includes a. monitor connected to a communication medium of 
the network for passively, and in real time, monitoring 
transmitted packets of different protocols and storing 
information about communications associated with the 

30 packets, the communications being in accordance with 

different protocols, and a workstation for receiving the 
information about the commune i at ions from the monitor and 
providing an interface to a user. The monitor and the 
workstation include means for relaying the information 

35 about multiple protocols with respect to communication in 
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the different protocols from the monitor to the 
workstation in accordance with a single common network 
management protocol. 

In general, in another aspect, the invention 

5 features diagnosing communication problems between two 
nodes in a network of nodes interconnected by links. The 
operation of the network is monitored with respect to 
specific items of performance during normal operation. A 
model of normal operation of the network is generated 

10 based on the monitoring. Acceptable threshold levels are 
set for the specific items of performance based on the 
model. The operation of the network is monitored with 
respect to the specific items of performance during 
periods which may include abnormal operation. When 

15 abnormal operation of the network with respect to 
communication between the two nodes is detected, the 
problem is diagnosed by separately analyzing the 
performance of each of the nodes and each of the links 
connecting the two nodes to isolate the abnormal 

20 operation. 

In general, in another aspect, the invention 
features a method of timing the duration of a transaction 
of interest occurring in the course of communication 
between nodes of a network, the beginning of the 
25 transaction being defined by the sending of a first 

packet of a particular kind from one node to the other, 
and the end of the transaction being defined by the 
sending of another packet of a particular kind between 
the nodes. In the method, packets transmitted in the 
network are monitored passively and in real time. The 
beginning time of the transaction is determined based on 
the appearance of the first packet. A determination is 
made of when the other packet has been transmitted. The 
timing of the duration of the transaction is ended upon 
35 the appearance of the other packet. 
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In general, in another aspect, the invention 
features, tracking node address to node name mappings in 
a network of nodes of the kind in which each node has a 
possibly nonunigue node name and a unigue node address 
5 within the network and in which node addresses can be 
assigned and reassigned to node names dynamically using a 
name binding protocol message incorporated within a 
packet. In the method, packets transmitted in the 
network are monitored, and a table linking node names to 

10 node addresses is updated based on information contained 
in the name binding protocol messages in the packets. 

One advantage of the invention is that it enables 
a network manager to passively monitor multi-protocol 
networks at multiple layers of the communications. In 

15 addition, it organizes and presents network performance 
statistics in terms of dialogs which are occurring at any 
desired level of the communication. This technique of 
organizing and displaying network performance statistics 
provides an effective and useful view of network 

2 0 performance and facilitates a quick diagnosis of network 

problems. 

Other advantages and features will become apparent 
from the following description of the preferred 
embodiment and from the claims. 
25 Description of the Preferred Embodiments 

Fig. 1 is a block diagram of a network; 

Fig. 2 shows the layered structure of a network 
communication and a protocol tree within that layered 
environment ; 

3 0 Fig. 3 illustrates the structure of an 

ethernet/ IP/TCP packet; 

Fig. 4 illustrates the different layers of a 
communication between two nodes; 

Fig. 5 shows the software modules within the 
35 Monitor; 
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Fig. 6 shows the structure of the Monitor software 
in terms of tasks and intertask communication mechanisms; 

Figs, 7a-c show the STATS data structures which 
store performance statistics relating to the the data 

5 link layer; 

Fig. 8 is a event/ state table describing the 
operation of the state machine for a TCP connection; 

Fig. 9a is a history data structure that is 
identified by a pointer found in the appropriate dialog 
10 statistics data within STATS; 

Fig. 9b is a record from the history table; 

Fig. 10 is a flow diagram of the 
Look_for_Data_State routine; 

Fig. 11 is a flow diagram of the 
15 LookJEorJEnitiator routine that is called by the 
Look_forJData_State routine; 

Fig. 12 is a flow diagram of the 
Look_for_Retransmission routine which is called by the 
Look_at_History routine; 
20 Fig- 13 is a diagram of the major steps in 

processing a frame through the Real Time Parser (RTP) ; 

Fig. 14 is a diagram of the major steps in the 
processing a statistics threshold event; 

Fig. 15 is a diagram of the major steps in the 
25 processing of a database update; 

Fig. 16 is a diagram of the major, steps in the 
processing of a monitor control request; 

Fig. 17 is a logical map of the network as 
displayed by the Management Workstation; 
3 0 Fig. 18 is a basic summary tool display screen; 

Fig. 19 is a protocol selection menu that may be 
invoked through the summary tool display screen; 

Figs. 20a-g are examples of the statistical 
variables which are displayed for different protocols; 



Fig. 21 is an example of information that is 
displayed in the dialogs panel of the summary tool 
display screen; 

Fig. 22 is a basic data screen presenting a rate 
5 values panel, a count values panel and a protocols seen 
panel; 

Fig. 23 is a traffic matrix screen; 
Fig. 24 is a flow diagram of the algorithm for 
adaptively establishing network thresholds based upon 
10 actual network performance; 

Fig. 25 is a simple multi-segment network; 





Fig. 26 is a flow diagram 
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Fig. 27 is a 
analyzer algorithm; 


flow 


diagram 
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the 


source node 




Fig. 28 is a 
analyzer algorithm; 


flow 


diagram 
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the 


sink node 




Fig. 29 is a 
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Fig. 30 is a 
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flow 


diagram 
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DLL problem 




Fig. 31 is a 
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Fig. 32 is a flow 
problem checking routine; 


diagram 


of 


the 


IP link component 




Fig. 33 is a 
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component problem checking routine; 

Fig. 34 shows the structure of the event timing 
database ; 

30 Fig. 3 5 is a flow diagram of the operation of the 

event timing module (ETM) in the Network Monitors- 
Fig. 3 6 is a network which includes an Appletalk® 
segment ; 

Fig. 37 is a Name Table that is maintained by the 
35 Address Tracking Module (ATM) ; 
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Fig. 38 is a flow diagram of the operation of the 
ATM; and 

Fig. 39 is a flow diagram of the operation of the 

ATM. 

5 Also attached hereto before the claims are the 

following appendices: 

Appendix I identifies the SNMP MIB subset that is 
supported by the Monitor and the Management Workstation 
(2 pages) ; 

10 Appendix II defines the extension to the standard 

MIB that are supported by the Monitor and the Management 

Workstation (25 pages) ; 

Appendix III is a summary of the protocol 

variables for which the Monitor gathers statistics and a 
15 brief description of the variables, where appropriate (17 

pages) ; 

Appendix IV is a list of the Summary Tool Values 
Display Fields with brief descriptions (2 pages) ; and 

Appendix V is a description of the actual screens 
20 for the Values Tool (34 pages) . 
g tructyT-e and Operation 
Th*> Network: 

A typical network, such as the one shown in Fig. 
1, includes at least three major components, namely, 

25 network nodes 2, network elements 4 and communication 

lines 6. Network nodes 2 are the individual computers on 
the network. They are the very reason the network 
exists. They include but are not limited to workstations 
(WS) , personal computers (PC) , file servers (FS) , compute 

30 servers (CS) and host computers (e.g., a VAX), to name 
but a few. The term server is often used as though it 
was different from a node, but it is, in fact, just a 
node providing special services. 

In general, network elements 4 are anything that 

35 participate in the service of providing data movement in 
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a network, i.e., providing the basic communications. 
They include, but are not limited to, LAN«s, routers, 
bridges, gateways, multiplexors, switches and connectors. 
Bridges serve as connections between different network 
5 segments. They keep track of the nodes which are 
connected to each of the segments to which they are 
connected. When they see a packet on one segment that is 
addressed to a node on another of their segments, they 
grab the packet from the one segment and transfer it to 

10 the proper segment. Gateways generally provide 

connections between different network segments that are 
operating under different protocols and serve to convert 
communications from one protocol to the other. Nodes 
send packets to routers so that they may be directed over 

15 the appropriate segments to the intended destination 
node. 

Finally, network or communication lines 6 are the 
components of the network which connect nodes 2 and 
elements 4 together so that communicatons between nodes 2 

20 may take place. They can be private lines, satellite 
lines or Public Carrier lines. They are expensive 
resources and are usually managed as separate entities. 
Often networks are organized into segments 8 that are 
connected by network elements 4. a segment 8 is a 

25 section of a LAN connected at a physical level (this may 
include repeaters) . within a segment, no protocols at 
layers above the physical layer are needed to enable 
signals from two stations on the same segment to reach 
each other (i.e., there are no routers, bridges, 

30 gateways. . . ) . 

The NetWork Monitor and the Mana gement Workstation ; 

In the described embodiment, there are two basic 
elements to the monitoring system which is to be. 
described, namely, a Network Monitor 10 and a Management 
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Workstation 12. Both elements interact with each other 
over the local area network (LAN) . 

Network Monitor 10 (referred to hereinafter simply 
as Monitor 10) is the data collection module which is 
5 attached to the LAN. It is a high performance real time 
front end processor which collects packets on the network 
and performs some degree of analysis to search for actual 
or potential problems and to maintain statistical 
information for use in later analysis. In general, it 

10 performs the following functions. It operates in a 
promiscuous mode to capture and analyze all packets on 
the segment and it extracts all items of interest from 
the frames. It generates alarms to notify the Management 
Workstation of the occurence of significant events. It 

15 receives commands from the Management Workstation, 
processes them appropriately and returns responses. 

Management Workstation 12 is the operator 
interface. It collects and presents troubleshooting and 
performance information to the user. It is based on the 

20 SunNet Manager (SNM) product and provides a graphical 
network-map-based interface and sophisticated data 
presentation and analysis tools. It receives information 
from Monitor 10, stores it and displays the information 
in various ways. It also instructs Monitor 10 to perform 

25 certain actions. Monitor 10, in turn, sends responses 
and alarms to Management Workstation 12 over either the 
primary LAN or a backup serial link 14 using SNMP with 
the MIB extensions defined later. 

These devices can be connected to each other over 

30 various types of networks and are not limited to 

connections over a local area network. As indicated in 
Fig. 1, there can be multiple Workstations 12 as well as 
multiple Monitors 10. 

Before describing these components in greater 

35 detail, background information will first be reviewed 



regarding communication protocols which specify how 
communications are conducted over the network and 
regarding the structure of the packets. 
The Protocol Tree * 

As shown in Fig. 2, communication over the network 
is organized as a series of layers or levels, each one 
built upon the next lower one, and each one specified by 
one or more protocols (represented by the boxes) . Each 
layer is responsible for handling a different phase of 
the communication between nodes on the network. The 
protocols for each layer are defined so that the services 
offered by any layer are relatively independent of the 
services offered by the neighbors above and below. 
Although the identities and number of layers may differ 
depending on the network (i.e., the protocol set defining 
communication oyer the network), in general, most of them 
share a similar structure and have features in common. 

For purposes of the present description, the Open 
Systems Interconnection (OSI) model will be presented as 
representative of structured protocol architectures. The 
OSI model, developed by the International Organization 
for Standardization, includes seven layers. As indicated 
in Fig. 2, there is a physical layer, a data link layer 
(DLL) , a network layer, a transport layer, a session 
layer, a presentation layer and an application layer, in 
that order. As background for what is to follow, the 
function of each of these layers will be briefly 
described. 

The physical layer provides the physical medium 
for the data transmission. It specifies the electrical 
and mechanical interfaces of the network and deals with 
bit level detail. The data link layer is responsible for 
ensuring an error-free physical link between the 
communicating nodes. It is responsible for creating and 
recognizing frame boundaries (i.e., the boundaries of the 
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packets of data that are sent over the network.) The 
network layer determines how packets are routed within 
the network. The transport layer accepts data from the 
layer above it (i.e., the session layer), breaks the 
5 packets up into smaller units, if required, and passes 
these to the network layer for transmission over the 
network. It may insure that the smaller pieces all 
arrive properly at the other end. The session layer is 
the user's interface into the network. The user must 
10 interface with the session layer in order to negotiate a 
connection with a process in another machine. The 
presentation layer provides code conversion and data 
reformatting for the user's application. Finally, the 
application layer selects the overall network service for 
15 the user's application. 

Fig. 2 also shows the protocol tree which is 
implemented by the described embodiment. A protocol tree 
shows the protocols that apply to each layer and it 
identifies by the tree structure which protocols at each 
20 layer can run -on top of* the protocols of the next lower 
layer. Though standard abbreviations are used to 
identify the protocols, for the convenience of the 
reader, the meaning of the abbreviations are as follows: 
ARP Address Resolution Protocol 

25 ETHERNET Ethernet Data Link Control 

FTP File Transfer Protocol 

ICMP internet Control Message Protocol 

IP Internet Protocol 

LLC 802.2 Logical Link Control 

MAC 802.3 CSMA/CD Media Access Control 

NFS Network File system 

NSP Name Server Protocol 

rarp Reverse Address Resolution Protocol 

SMTP Simple Mail Transfer Protocol 

SNMP simple Network Management Protocol 
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TCP Transmission Control Protocol 

TFTP Trivial File Transfer Protocol 

UDP User Datagram Protocol 

Two terms are commonly used to describe the protocol 
5 tree, namely, a protocol stack and a protocol family (or 
suite) . A protocol stack generally refers to the 
underlying protocols that are used when sending a message 
over a network. For example, FTP / TCP / IP / LLC is a 
protocol stack. A protocol family is a loose association 
10 of protocols which tend to be used on the same network 
(or derive from a common source). Thus, for example, the 
TCP/IP family includes IP, TCP, UDP, ARP, TELNET and FTP. 
The Decnet family includes the protocols from Digital 
Equipment Corporation. And the SNA family includes the 
15 protocols from IBM. 
The Packet! 

The relevant protocol stack defines the structure 
of each packet that is sent over the network. Fig. 3, 
which shows an TCP/IP packet, illustrates the typical 

20 structure of a packet. In general, each level of the 
protocol stack takes the data from the next higher level 
and adds header information to form a protocol data unit 
(PDU) which it passes to the next lower level. That is, 
as the data from the application is passed down through 

25 the protocol layers in preparation for transmission over 
the network, each layer adds its own information to the 
data passed down from above until the complete packet is 
assembled. Thus, the structure of a packet ressembles 
that of an onion, with each PDU of a given layer wrapped 

30 within the PDU of the adjacent lower level. 

At the ethernet level, the PDU includes a 
destination address (DEST MAC ADDR) , a source address 
(SRC MAC ADDR) , a type (TYPE) identifying the protocol 
which is running on top of this layer, and a DATA field 

35 for the PDU from the IP layer. 
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Like the ethernet packet, the PDU for the IP layer 
includes an IP header plus a DATA field. The IP header 
includes a type field (TYPE) for indicating the type of 
service, a length field (LGTH) for specifying the total 

5 length of the PDU, an identification field (ID) , a 

protocol field (PROT) for identifying the protocol which 
is running on top of the IP layer (in this case, TCP), a 
source address field (SRC ApDR) for specifying the IP 
address of the sender, a. destination address field (DEST 
10 ADDR) for specifying the IP address of the destination 
node, and a DATA field. 

The PDU built by the TCP protocol also consists of 
a header and the data passed down from the next higher 
layer. In this case the header includes a source port 

15 field (SRC PORT) for specifying the port number of the 
sender, a destination port field (DEST PORT) for 
specifying the port number of the destination, a sequence 
number field (SEQ NO.) for specifying the sequence number 
of the data that is being sent in this packet, and an 

20 acknowledgment number field (ACK NO.) for specifying the 
number of the acknowledgment being returned. It also 
includes bits which identify the packet type, namely, an 
acknowledgment bit (ACK) , a reset connection bit (RST) , a 
synchronize bit (SYN) , and a no more data from sender bit 

25 (FIN) . There is also a window size field (WINDOW) for 
specifying the size of the window being used. 
The concept °f a Dialog: 

The concept of a dialog is used throughout the 
following description. As will become apparent, it is a 

3 0 concept which provides a useful way of conceptualizing, 
organizing and displaying information about the 
performance of a network - for any protocol and for any 
layer of the multi-level protocol stack. 

As noted above, the basic unit of information in 

3 5 communication is a packet. A packet conveys meaning 
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between the sender and the receiver and is part of a 
larger framework of packet exchanges. The larger 
exchange is called a dialog within the context of this 
document. That is, a dialog is a communication between a 
sender and a receiver, which is composed of one or more 
packets being transmitted between the two. There can be 
multiple senders and receivers which can change roles. 
In fact, most dialogs involve exchanges in both 
directions . 

Stated another way, a dialog is the exchange of 
messages and the associated meaning and state that is 
inherent in any particular exchange at any layer. It 
refers to the exchange between the peer entities 
(hardware or software) in any communication, in those 
15 situations where there is a layering of protocols, any 
particular message exchange could be viewed as belonging 
to multiple dialogs. For example, in Fig. 4 Nodes A and 
B are exchanging packets and are engaged in multiple 
dialogs. Layer 1 in Node A has a dialog with Layer l in 
20 Node B. For this example, one could state that thia is 
the data link layer and the nature of the dialog deals 
with the message length, number of messages, errors and 
perhaps the guarantee of the delivery. Simultaneously, 
Layer n of Node A is having a dialog with Layer n of node 
25 B. For the sake of the example, one could state that 
this is an application layer dialog which deals with 
virtual terminal connections and response rates. One can 
also assume that all of the other layers (2 through n-l) 
are also having simultaneous dialogs. 
30 In some protocols there are explicit primitives 

that deal with the dialog and they are generally referred 
to as connections or virtual circuits. However, dialogs 
exist even in stateless and connectionless protocols. 
Two more examples will be described to help clarify the 
35 concept further, one dealing with a connection oriented 
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protocol and the other dealing with a connectionless 
protocol. 

In a typical connection oriented protocol, Node A 
sends a connection request (CR) message to Node B. The 
5 CR is an explicit request to form a connection. This is 
the start of a particular dialog, which is no different 
from the start of the connection. Nodes A and B could 
have other dialogs active simultaneously with this 
particular dialog. Each dialog is seen as unique. A 

10 connection is a particular type of dialog. 

In a typical connectionless protocol, Node A sends 
Node B a message that is a datagram which has no 
connection paradigm, in fact, neither do the protocol (s) 
at higher layers. The application protocol designates 

15 this as a request to initiate some action. For example, 
a file server protocol such as Sun Microsystems' Network 
File System (NFS) could make a mount request. A dialog 
comes into existence once the communication between Nodes 
A and B has begun. It is possible to determine that 

20 communication has occurred and to determine the actions 
being requested. If in fact there exists more than one 
communication thread between Nodes A and B, then these 
would represent separate, different dialogs. 
TnfHH*. Network Monitor: 

25 Monitor 10 includes a MIPS R3000 general purpose 

microprocessor (from MIPS Computer Systems, Inc.) running 
at 25 MHz. It is capable of providing 20 mips processing 
power. Monitor 10 also includes a 64Kbyte instruction 
cache and a 64Kbyte data cache, implemented by SRAM. 

30 The major software modules of Monitor 10 are 

implemented as a mixture of tasks and subroutine 
libraries as shown in Fig. 5. It is organized this way 
so as to minimise the context switching overhead incurred 
during critical processing sequences. There is NO 

35 PREEMPTION of any module in the monitor subsystem. Each 
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module is cognizant of the fact that it should return 
control to the kernel in order to let other tasks run. 
Since the monitor subsystem is a closed environment, the 
software is aware of real time constraints* 

Among the major modules which make up Monitor 10 
is a real time kernel 20, a boot/load module 22, a driver 
24, a test module 26, an SNMP Agent 28, a Timer module 
30, a real time parser (RTP) 32, a Message Transport 
Module (MTM) 34, a statistics database (STATS) 36, an 
Event Manager (EM) 38, an Event Timing Module (ETM) 40 
and a control module 42. Each of these will now be 
described in greater detail. 

Real Time Kernel 20 takes care of the general 
housekeeping activities in Monitor 10. It is responsible 
for scheduling, handling intertask communications via 
queues, managing a potentially large number of timers, 
manipulating linked lists, and handling simple memory 
management . 

Boot/Load Module 22, which is FProm based, enables 
Monitor 10 to start itself when the power is turned on in 
the box. It initializes functions such as diagnostics, 
and environmental initialization and it initiates down 
loading of the Network Monitor Software including program 
and configuration files from the Management Workstation. 
Boot/ load module 22 is also responsible for reloading 
program and/or configuration data following internal 
error detection or on command from the Management 
Workstation. To accomplish down loading, boot/ load 
module 22 uses the Trivial File Transfer Protocol (TFTP) . 
The protocol stack used for loading is 

TFTP /UDP/ IP /ether net over the LAN and TFTP/UDP/IP/SLIP 
over the serial line. 

Device Driver 24 manages the network controller 
hardware so that Monitor 10 is able to read and write 
packets from the network and it manages the serial 
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interface. It does so both for the purposes of 
monitoring traffic (promiscuous mode) and for the 
purposes of communicating with the Management Workstation 
and other devices on the network. The communication 
5 occurs through the network controller hardware of the 
physical network (e.g. Ethernet). The drivers for the 
IAN controller and serial line interface are used by the 
boot load module and the MTM. They provide access to the 
chips and isolate higher, layers from the hardware 

10 specifics. 

Test module 26 performs and reports results of 
physical layer tests (TDR, connectivity,...) under 
control of the Management Workstation. It provides 
traffic load information in response to user requests 
15 identifying the particular traffic data of interest. The 
load information is reported either as a percent of 
available bandwidth or as frame size(s) plus rate. 

SNMP Agent 28 translates requests and information 
into the network management protocol being used to 
20 communicate with the Management Workstation, e.g., the 
Simple Network Management Protocol (SNMP) . 

Control Module 42 coordinates access to monitor 
control variables and performs actions necessary when 
these are altered. Among the monitor control variables 
25 which it handles are the following: 

set reset monitor - transfer control to reset 

logic; 

set time of day - modify monitor hardware clock 
and generate response to Management Workstation; 
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get time of day - read monitor hardware clock 
generate response to Workstation; 
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set trap permit - send trap control ITM to EM and 
generate response to Workstation; 

get trap permit - generate response to 
Workstation; 

5 Control module 42 also updates parse control records 

within STATS when invoked by the RTP (to be described) or 
during overload conditions so that higher layers of 
parsing are dropped until the overload situation is 
resolved. When overload is over it restores full 

10 parsing. 

Timer 30 is invoked periodically to perform 
general housekeeping functions. It pulses the watchdog 
timer at appropriate intervals. It also takes care of 
internal time stamping and kicking off routines like the 

15 EM routine which periodically recalculates certain 

numbers within the statistical database (i.e., STATS). 

Real Time Parser (RTP) 32 sees all frames on the 
network and it determines which protocols are being used 
and interprets the frames. The RTP includes a protocol 

20 parser and a state machine. The protocol parser parses a 
received frame in the "classical" manner, layer-by- layer , 
lowest layer first. The parsing is performed such that 
the statistical objects in STATS (i.e., the network 
parameters for which performance data is kept) are 

25 maintained. Which layers are to have statistics stored 
for them is determined by a parse control record that is 
stored in STATS (to be described later) . As each layer 
is parsed, the RTP invokes the appropriate functions in 
the statistics module (STATS) to update those statistical 

3 0 objects which must be changed. 

The state machine within RTP 3 2 is responsible for 
tracking state as appropriate to protocols and 
connections. It is responsible for maintaining and 
updating the connection oriented statistical elements in 
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STATS. In order to track connection states and events, 
the RTP invoices a routine within the state machine. This 
routine determines the state of a connection based on 
past observed frames and keeps track of sequence numbers. 
5 It is the routine that determines if a connection is in 
data transfer state and if a retransmission has occurred. 
The objectives of the state machine are to keep a brief 
history of events, state transitions, and sequence 
numbers per connection; to detect data transfer state so 

10 that sequence tracking can begin; and to count 

inconsistencies but still maintain tracking while falling 
into an appropriate state (e.g. unknown) . 

RTP 32 also performs overload control by 
determining the number of frames awaiting processing and 

15 invoking control module 42 to update the parse control 

records so as to reduce the parsing depth when the number 
becomes too large. 

Statistics Module (STATS) 36 is where Monitor 10 
keeps information about the statistical objects it is 

20 charged with monitoring. A statistical object represents 
a network parameter for which performance information is 
gathered. This information is contained in an extended 
MIB (Management Information Base) , which is updated by 

RTP 32 and EM 38. 

25 STATS updates statistical objects in response to 

RTP invocation. There are at least four statistical 
object classes, namely, counters, timers, percentages 
(%) , and meters. Each statistical object is implemented 
as appropriate to the object class to which it belongs. 

30 That is, each statistical object behaves such that when 
invoked by RTP 32 it updates and then generates an alarm 
if its value meets a preset threshold. (Meets means that 
for a high threshold the value is equal to or greater 
than the threshold and for a low threshold the value is 
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equal to or less than the threshold. Note that a single 
object may have both high and low thresholds.) 

STATS 36 is responsible for the maintenance and 
initial analysis of the database. This includes 
5 coordinating access to the database variables, ensuring 
appropriate interlocks are applied and generating alarms 
when thresholds are crossed. Only STATS 36 is aware of 
the internal structure of the database, the rest of the 
system is not. 

10 STATS 36 is also responsible for tracking events 

of interest in the form of various statistical 
reductions. Examples are counters, rate meters, and rate 
of change of rate meters, it initiates events based on 
particular statistics reaching configured limits, i.e., 

15 thresholds. The events are passed to the EM which sends 
a trap (i.e., an, alarm) to the Management Workstation. 
The statistics within STATS 36 are readable from the 
Management Workstation on request. 

STATS performs lookup on all addressing fields. 

20 It assigns new data structures to address field values 
not currently present. It performs any hashing for fast 
access to the database. More details will be presented 
later in this document. 

Event Manager (EM) 38 extracts statistics from 

25 STATS and formats it in ways that allow the Workstation 
to understand it. it also examines the various 
statistics to see if their behavior warrants a 
notification to the Management Workstation. If so, it 
uses the SNMP Agent software to initiate such 

30' notifications. 

If the Workstation asks for data, EM 38 gets the 
data from STATS and sends it to the Workstation. It also 
performs some level of analysis for statistical, 
accounting and alarm filtering and decides on further 

35 action (e.g. delivery to the Management Workstation). 
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EM 38 is also responsible for controlling the delivery of 
events to the Management Workstation, e.g., it performs 
event filtering. The action to be taken on receipt of an 
event (e.g. threshold exceeded in STATS) is specified by 
the event action associated with the threshold. The 
event is used as an index to select the defined action 
(e.g. report to Workstation, run local routine xxxx, 
ignore) . The action can be modified by commands from the 
Management workstation (e.g., turn off an alarm) or by 
the control module in an overload situation. An update 
to the event action, however, does not affect events 
previously processed even if they are still waiting for 
transmission to the Management Workstation. Discarded 
events are counted as such by EM 38. 
15 EM 38 also implements a throttle mechanism to 

limit the rate of delivery of alarms to the console based 
on configured limits. This prevents the rapid generation 
of multiple alarms. In essence, Monitor 10 is given a 
maximum frequency at which alarms may be sent to the 
20 Workstation. Although alarms in excess of the maximum 
frequency are discarded, a count is kept of the number of 
alarms that were discarded. 

EM 38 invokes routines from the statistics module 
(STATS) to perform periodic updates such as rate 
25 calculations and threshold checks. It calculates time 
average, e.g., average traffic by source stations, 
destination stations. EM 38 requests for access to 
monitor control variables are passed to the control 
module . 

30 EM 38 checks whether asynchronous traps (i.e., 

alarms) to the Workstation are permitted before 

generating any. 

EM 38 receives database update requests from the 
Management Workstation and invokes the statistics module 
35 (STATS) to process these. 
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Message Transport Module (MTM) 34, which is DRAM 
based, has two distinct but closely related functions. 
First, it is responsible for the conversion of 
Workstation commands and responses from the internal 
format used within Monitor io to the format used to 
communicate over the network. it isolates the rest of 
the system from the protocol used to communicate within 
Management Workstation. it translates between the 
internal representation of data and ASN.l used for SNMP. 
It performs initial decoding of Workstation requests and 
directs the requests to appropriate modules for 
processing, it implements SNMP /UDP/ IP /LLC or ETHERNET 
protocols for LAN and SNMP/UDP/IP/SLIP protocols for 
serial line. It receives network management commands 
from the Management Workstation and delivers these to the 
appropriate module for action. Alarms and responses 
destined for the Workstation are also directed via this 
module. 

Second, MTM 34 is responsible for the delivery and 
reception of data to and from the Management Workstation 
using the protocol appropriate to the network. Primary 
and backup communication paths are provided transparently 
to the rest of the monitor modules (e.g. LAN and dial up 
link) . It is capable of full duplex delivery of messages 
25 between the console and monitoring module. The messages 
carry event, configuration, test and statistics data. 

Event Timing Module (ETM) 40 keeps track of the 
start time and end times of user specified transactions 
over the network. In essence, this module monitors the 
responsiveness of the network at any protocol or layer 
specified by the user. 

Address Tracking Module 42 keeps track of the node 
name to node address bindings on networks which implement 
dynamic node addressing protocols. 
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Memory management for Monitor 10 is handled in 
accordance with following guidelines. The available 
memory is divided into four blocks during system 
initialization. One block includes receive frame 
5 buffers. They are used for receiving LAN traffic and for 
receiving secondary link traffic. These are organized as 
linked lists of fixed sized buffers. A second block 
includes system control message blocks. They are used 
for intertask messages within Monitor 10 and are 
10 organized as a linked list of free blocks and multiple 
linked lists of in process intertask messages. A third 
block includes transmit buffers. They are used for 
creation and transmission of workstation alarms and 
responses and are organized as a linked list of fixed 
sized buffers. A fourth block is the statistics. This - 
is allocated as a fixed size area at system 
initialization and managed by the statistics module 
during system operation. 
Taste Structure of Monitor; 

The structure of the Monitor in terms of tasks and 
intertask messages is shown in Fig. 6. The rectangular 
blocks represent interrupt service routines, the ovals 
represent tasks and the circles represent input queues. 

Each task in the system has a single input queue 
25 which it uses to receive all input. All inter-process 
communications take place via messages placed onto the 
input queue of the destination task. Each task waits on 
a (well known) input queue and processes events or inter- 
task messages (i.e., ITM's) as they are received. Each 
30 task returns to the kernel within an appropriate time 
period defined for each task (e.g. after processing a 
fixed number of events) . 

Interrupt service routines (ISR's) run on receipt 
of hardware generated interrupts. They invoke task level 
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processing by sending an ITM to the input queue of the 
appropriate task. 

The kernel scheduler acts as the base loop of the 
system and calls any runnable tasks as subroutines. The 
5 determination of whether a task is runnable is made from 
the input queue, i.e., if this has an entry the task has 
work to perform. The scheduler scans the input queues 
for each task in a round robin fashion and invokes a task 
with input pending. Each task processes items from its 

10 input queue and returns to the scheduler within a defined 
period. The scheduler then continues the scan cycle of 
the input queues. This avoids any task locking out 
others by processing a continuously busy input queue. A 
task may be given an effectively higher priority by 

15 providing it with multiple entries in the scan table. 

Database accesses are generally performed using 
access routines. This hides the internal structure of 
the database from other modules and also ensures that 
appropriate interlocks are applied to shared data. 

20 The EM processes a single event from the input 

queue and then returns to the scheduler. 

The MTM Xmit task processes a single event from 
its input queue and then returns control to the 
scheduler. The MTM Recv task processes events from the 

25 input queue until it is empty or a defined number (e.g. 
10) events have been processed and then returns control 
to the scheduler. 

The timer task processes a single event from the 
input queue and then returns control to the scheduler. 

3 0 RTP continues to process frames until the input 

queue is empty or it has processed a defined number (e.g. 
10) frames. It then returns to the scheduler. 

The following sections contain a more detailed 
description of some of the above-identified software 

35 modules* 
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statistics Module f STATS} : 

The fvmctions of the statistics module are: 

* to define statistics records; 

* to allocate and initialize statistics records; 

5 * to provide routines to lookup statistics records, 

e.g. lookup_id_addr ; 

* to provide routines to manipulate the statistics 
within the records, e.g. stats_age, stats_incr and 
stats_rate; 

10 * to provide routines to free statistics records, 

e.g. stats_allocate and stats_deallocate 
It provides these services to the Real Time Parser 
(RTP) module and to the Event Manager (EM) module. 

STATS defines the database and it contains 

15 subroutines for updating the statistics which it keeps. 

STATS contains the type definitions for all 
statistics records (e.g. DLL, IP, TCP statistics) . It 
provides an initialization routine whose major function 
is to allocate statistics records at startup from 

20 cacheable memory. It provides lookup routines in order 
to get at the statistics. Each type of statistics record 
has its own lookup routine (e.g. lookup_ip_address) which 
returns a pointer to a statistics record of the 
appropriate type or NULL. 

25 As a received frame is being parsed, statistics 

within statistics records need to be manipulated (e.g. 
incremented) to record relevant information about the 
frame. STATS provides the routines to manipulate those 
statistics. For example, there is a routine to update 

30 counters. After the counter is incremented/ decremented 
and if there is a non-zero threshold associated with the 
counter, the internal routine compares its value to the 
threshold. If the threshold has been exceeded, the Event 
Manager is signaled in order to send a trap to the 

35 Workstation. Besides manipulating statistics, these 



routines, if necessary, signal the Event Manager via an 
Intertask Message (ITM) to send a trap to the Management 
Workstation. 

The following is an example of some of the 
statistics records that are kept in STATS. 

o monitor statistics 

o mac statistics for segment 

o 11c statisics for segment 

o statistics per ethernet/lsap type for segment 

o ip statistics for segment 

o icmp statistics for segment 

o tcp statistics for segment 

o udp statistics for segment 

o nfs statistics for segment 

o ftp control statistics for segment 

o ftp data statistics for segment 

o telnet statistics for segment 

o smtp statistics for segment 

o arp statistics for segment 

o statistics per mac address 

o statistics per ethernet type/lasp per mac 

address 

o statistics per ip address (includes icmp) 

o statistics per tcp socket 

o statistics per udp socket 

o statistics per nfs socket 

o statistics per ftp control socket 

o statistics per ftp data socket 

o statistics per telnet socket 

o statistics per smtp socket 

o arp statistics per ip address 

o statistics per mac address pair 

o statistics per ip pair (includes icmp) 
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o statistics per tcp connection 
o statistics per udp pair 
o statistics per nfs pair 
o- statistics per ftp control connection 
5 o statistics per ftp data connection 

o statistics per telnet connection 
o statistics per smtp connection 

o connection histpries per udp and tcp socket 

All statistics are organized similarly across protocol 
10 types. The details of the data structures for the DLL 
level are presented later. 

As noted earlier, there are four statistical 
object classes (i.e., variables), namely, counts, rates, 
percentages (%) , and meters. They are defined and 
15 implemented as follows. 

A count is a continuously incrementing variable 
which rolls around to 0 on overflow. It may be reset on 
command from the user (or from software) . A threshold 
may be applied to the count and will cause an alarm when 
20 the threshold count is reached. The threshold count 

fires each time the counter increments past the threshold 
value. For example, if the threshold is set to 5, alarms 
are generated when the count is 5 , 10 , 15 , . . . 

A rate is essentially a first derivative of a 
25 count variable. The rate is calculated at a period 

appropriate to the variable. For each rate variable, a 
minimum, maximum and average value is maintained. 
Thresholds may be set on high values of the rate. The 
maximums and minimums may be reset on command. The 
3 0 threshold event is triggered each time the rate 
calculated is in the threshold region. 

As commonly used, the % is calculated at a period 
appropriate to the variable. For each % variable a 
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minimum, maximum and average value is maintained. A 
threshold may be set on high values of the %. The 
threshold event is triggered each time the % calculated 
is in the threshold region. 
5 Finally, a meter is a variable which may take any 

discrete value within a defined range. The current value 
has no correlation to past or future values. A threshold 
may be set on a maximum and/or minimum value for a meter. 
The rate and % fields of network event variables 

10 are updated differently than counter or meter fields in 
that they are calculated at fixed intervals rather than 
on receipt of data from the network. 

Structures for statistics kept on a per address or 
per address pair basis are allocated at initialization 

15 time. There are several sizes for these structures. 

Structures of the same size are linked together in a free 
pool. As a new structure is needed, it is obtained from 
a free queue, initialized, and linked into an active 
list. Active lists are kept on a per statistics type 

20 basis. 

As an address or address pair (e.g. mac, ip, 
tcp...) is seen, RTP code calls an appropriate lookup 
routine. The lookup routine scans active statistics 
structures to see if a structure has already been 

25 allocated for the statistics. Hashing algorithms are 
used in order to provide for efficient lookup. if no 
structure has been allocated, the lookup routine examines 
the appropriate parse control records to determine 
whether statistics should be kept, and, if so, it 

3 0 allocates a structure of the appropriate size, 
initializes it and links it into an active list. 

Either the address of a structure or a NULL is 
returned by these routines. If NULL is returned, the RTP 
does not stop parsing, but it will not be allowed to 
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store the statistics for which the structure was 
requested . 

The RTP updates statistics within the data base as 
it runs. This is done via macros defined for the RTP. 
5 The macros call on internal routines which know how to 
manipulate the relevant statistic. If the pointer to the 
statistics structure is NULL, the internal routine will 

not be invoked. 

The EM causes rates to be calculated. The STATS 
10 module supplies routines (e.g. stats_rate) which must be 
called by the EM in order to perform the rate 
calculations. It also calls subroutines to reformat the 
data in the database in order to present it to the 
workstation (i.e., in response to a get from the 

15 Workstation) . 

The calculation algorithms for the rate and % 
fields of network event variables are as follows. 

The following rates are calculated in units per 
second, at the indicated (approximate) intervals: 
20 1. 10 second intervals: 

e.g. DLL frame, byte, ethernet, 802.3, broadcast, 
multicast rates 
2. 60 second intervals 

e.g., all DLL error, ethertype/dsap rates 
25 all IP rates. 

TCP packets, bytes, errors, retransmitted packets, 
retransmitted bytes, acks, rsts 
UDP packet, error, byte rates 
FTP file transfer, byte transfer, error rates 
30 For these rates, the new average replaces the 

previous value directly. Maximum and minimum values are 
retained until reset by the user. 

The following rates are calculated in units per 
hour at the indicated time intervals: 
35 i. 15 minute interval. 
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e.g., TCP - connection rate 
Telnet connection rate 
FTP session rate 

The hourly rate is calculated from a sum of the 
last twelve 5 minute readings, as obtained from the 
buckets for the pertinent parameter. Each new reading, 
replaces the oldest of the twelve values maintained. 
Maximum and minimum values are retained until reset by 
the user. 

There are a number of other internal routines in 
STATS. For example, all statistical data collected by 
the Monitor is subject to age out. Thus, if no activity 
is seen for an address (or address pair) in the time 
period defined for age out, then the data is discarded 
and the space reclaimed so that it may be recycled, in 
this manner, the. Monitor is able to use the memory for 
active elements rather than stale data. The user can 
select the age out times for the different components. 
The EM periodically kicks off the aging mechanism to 
perform this recycling of resources. STATS provides the 
routines which the EM calls, e.g. stats_age. 

There are also routines in STATS to allocate and 
de-allocate Statistics, e.g., stats_al locate and 
stats_de-allocate. The allocate routine is called when 
stations and dialogs are picked up by the Network 
Monitor. The de-allocate routine is called by the aging 
routines when a structure is to be recycled. 
The Data Structu res in stat« ; 

The general structure of the database within STATS 
is illustrated by Figs. 7a-c, which shows information 
that is maintained for the Data Link Layer (DLL) and its 
organization. A set of data structures is kept for each 
address associated with the layer. In this case there 
are three relevant addresses, namely a segment address, 
indicating which segment the node is on, a MAC address 



WO 92/19054 



PCT/US92/02995 



- 34 - 

for the node on the segment, and an address which 
identifies the dialog occurring over that layer. The 
dialog address is the combination of the MAC addresses 
for the two nodes which make up the dialog. Thus, the 
5 overall data structure has three identifiable components: 
a segment address data structure (see Fig. 7a) , a MAC 
address data structure (see Fig. 7b) and a dialog data 
structure (see Fig. 7c) . 

The segment address structure includes a doubly 

10 linked list 102 of segment address records 104, each one 
for a different segment address. Each segment address 
record 104 contains a forward and backward link (field 
106) for forward and backward pointers to neighboring 
records and a hash link (field 108). In other words, the 

15 segment address records are accessed by either walking 
down the doubly linked list or by using a hashing 
mechanism to generate a pointer into the doubly linked 
list to the first record of a smaller hash linked list. 
Each record also contains the address of the segment 

20 (field 110) and a set of fields for other information. 
Among these are a flags field 112, a type field 114, a 
parse_control field 116, and an EM_control field 118. 
Flags field 112 contains a bit which indicates whether 
the identified address corresponds to the address of 

25 another Network Monitor. This field only has meaning in 
the MAC address record and not in the segment or dialog 
address record. Type field 114 identifies the MIB group 
which applies to this address. Parse control field 116 
is a bit mask which indicates what subgroups of 

30 statistics from the identified MIB group are maintained, 
if any. Flags field 112, type field 114 and parse 
control field 116 make up what is referred to as the 
parse control record for this MAC address. The Network 
Monitor uses a default value for parse control field 116 

35 upon initialization or whenever a new node is detected. 
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The default value turns off all statistics gathering. 
The statistics gathering for any particular address may 
subsequently be turned on by the Workstation through a 
Network Monitor control command that sets the appropriate 
bits of the parse control field to one. 

EM_control field us identifies the subgroups of 
statistics within the MIB group that have changed since 
the EM last serviced the database to update rates and 
other variables. This field is used by the EM to 
identify those parts of STATS which must be updated or 
for which recalculations must be performed when the EM 
next services STAT. 

Each segment address record 104 also contains 
three fields for time related information. There is a 
start_time field 120 for the time that is used to perform 
some of the rate calculations for the underlying 
statistics; a first_seen field 122 for the time at which 
the Network Monitor first saw the communication; and a 
last_seen field 124 for the time at which the last 
20 communication was seen. The last_seen time is used to 
age out the data structure if no activity is seen on the 
segment after a preselected period of time elapses. The 
first_seen time is a statistic which may be of interest 
to the network manager and is thus retrievable by the 
25 Management Workstation for display. 

Finally, each segment address record includes a 
stats_pointer field 126 for a pointer to a DLL segment 
statistics data structure 130 which contains all of the 
statistics that are maintained for the segment address. 
If the bits in parse_control field 116 are all set to 
off, indicating that no statistics are to be maintained 
for the address, then the pointer in stats_pointer field 
126 is a null pointer. 

The list of events shown in data structure 13 0 of 
Fig. 7a illustrates the type of data that is collected 



30 



35 



WO 92/19054 



PCT/US92/02995 



- 36 - 

for this address when the parse control field bits are 
set to on. Some of the entries in DLL segment statistics 
data structure 130 are pointers to buckets for historical 
data. In the case where buckets are maintained, there 
5 are twelve buckets each of which represents a time period 
of five minutes duration and each of which generally 
contains two items of information, namely, a count for 
the corresponding five minute time period and a MAX rate 
for that time period. MAX rate records any spikes which 

10 have occurred during the period and which the user may 
not have observed because he was not viewing that 
particular statistic at the time. 

At the end of DLL segment statistics data 
structure 130, there is a protocol_Q pointer 132 to a 

15 linked list 134 of protocol statistics records 136 

identifying all of the protocols which have been detected 
running on top of the DLL layer for the segment. Each 
record 136 includes a link 138 to the next record in the 
list, the identity of the protocol (field 140), a frames 

20 count for the number of frames detected' for the 

identified protocol (field 142); and a frame rate (field 
144) . 

The MAC address data structure is organized in a 
similar manner to that of the segment data structure (see 

25 Fig. 7b) . There is a doubly linked list 146 of MAC 

address records 148, each of which contains the same type 
of information as is stored in DLL segment address 
records 104. A pointer 150 at the end of each MAC 
address record 148 points to a DLL address statistics 

3 0 data structure 152, which like the DLL segment address 
data structure 13 0, contains fields for all of the 
statitics that are gathered for that DLL MAC address. 
Examples of the particular statistics are shown in Fig. 
7b. 
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At the end of DLL address statistics data 
structure is 2 , there are two poihter fields 1M and 154 
one for a pointer to a record 1 58 in a dialog linTgueue 
5 pro^ T ° ther 3 P ° lnter t0 3 x« of 

zz is 8 s c at r tios reooras 164 • Each ^ ^ * 

l^in th! " S 3 P ° inter t0 " eXt ent ^ 
^0 „h h qU6Ue " C ° ntainS a dial °* addr pointer 
which t0 ^ in «" DLL gLe 

10 ZTisZl T T ^ aSdreSS - (S * e "* >*> '"toco! 
contain ~ C ° rdS 164 ^ the Sane ^ucture and 
contain the sane categories of information as their 

. , ^ ure3, That is, dialog record 172 

includes the same categories o, information as ij 

tT^a^sTt^ S69Bent addre88 ^ a » d 
the^MAC address data structure. The address field 174 

contains the addresses of both ends of the dialog 

20 ""catenated together to form a single address. Th e 

frbitrari^T"? SddreSSeS "'^ the sl "* la «^ss are 
the IZl de8l9nated nodes * — ». respectively, m 
the statsjointer field 176 there is a pointer to a 
dialog statistics data structure 178 containing the 
flrstT " atiStiCS £or tha The entries in the 

and 182 are designated protocol entries and protocols 

vIIchT en H rleS ±S ^ nUnb8r ° f ' Protocol 

which have been seen between the two MAC addresses Th. 
30 protocols that have been seen are enumerated the 
protocols field 182. 

DLL dialog statistics data structure 178 
illustrated by Fig . 7c , includes seyeral additio ' nal 
fields of information which only appear in these 
35 structures for dialogs for which state information can be 
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kept (e.g. TCP connection) . The additional fields 
identify the transport protocol (e.g., TCP) (field 184) 
and the application which is running on top of that 
protocol (field 186) . They also include the identity of 
5 the initiator of the connection (field 188), the state of 
the connection (field 190) and the reason that the 
connection was closed, when it is closed (field 192) . 
Finally, they also include a state_po inter (field 194) 
which points to a history data structure that will be 

10 described in greater detail later. Suffice it to say, 
that the history data structure contains a short history 
of events and states for each end of the dialog. The 
state machine uses the information contained in the 
history data structure to loosely determine what the 

15 state of each of the end nodes is throughout the course 
of the connection. The qualifier "loosely" is used 
because the state machine does not closely shadow the 
state of the connection and thus is capable of recovering 
from loss of state due to lost packets or missed 

2 0 communications • 

The above-described structures and organization 
are used for all layers and all protocols within STATS. 
Real Time Parser fRTP) 

The RTP runs as an application task. It is 
25 scheduled by the Real Time Kernel scheduler when received 
frames are detected. The RTP parses the frames and 
causes statistics, state tracking, and tracing operations 
to be performed. 

The functions of the RTP are: 

3 0 * obtain frames from the RTP Input Queue; 

* parse the frames; 

* maintain statistics using routines supplied by the 
STATS module; 

* maintain protocol state information; 
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* notify the MTM via an ITM if a frame has been 
received with the Network Monitor's address as the 
destination address; and 

* notify the EM via an ITM if a frame has been 

5 received with any Network Monitor's address as the 

source address. 

The design of the RTP is straightforward. It is a 
collection of routines which perform protocol parsing. 
The RTP interfaces to the Real Time Kernel in order to 
10 perform RTP initialization, to be scheduled in order to 
parse frames, to free frames, to obtain and send an ITM 
to another task; and to report fatal errors. The RTP is 
invoked by the scheduler when there is at least one frame 
to parse. The appropriate parse routines are executed 
15 per frame. Each parse routine invokes the next level 
parse routine or, decides that parsing is done. 
Termination of the parse occurs on an error or when the 
frame has been completely parsed. 

Each parse routine is a separately compilable 
20 module. In general, parse routines share very little 

data. Each knows where to begin parsing in the frame and 
the length of the data remaining in the frame. 

The following is a list of the parse routines that 
are available within RTP for parsing the different 
25 protocols at the various layers. 

Data Link Layer Parse - rtp_dll_parse: 

This routine handles Ethernet, IEEE 802.3, IEEE 
802.2, and SNAP; See RFC 1010, Assigned Numbers 
for a description of SNAP (Subnetwork Access 
30 Protocol) . 

Address Resolution Protocol Parse - rtp_arp_parse 

ARP is parsed as specified in RFC 826. 
Internet Protocol Parse - rtp_ip_parse 

IP Version 4 is parsed as specified in RFC 791 as 
35 amended by RFC 950, RFC 919, and RFC 922. 
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internet Control Message Protocol Parse - rtp__icmp_j>arse 

ICMP is parsed as specified in RFC 792. 
Unit Data Protocol Parse - rtp_udp_parse 

UDP is parsed as specified in RFC 768. 
5 Transmission Control Protocol Parse - rtp_tcp_parse 

TCP is parsed as specified in RFC 793. 
Simple Mail Transfer Protocol Parse - rtp_smtp_parse 

SMTP is parsed as specified in RFC 821. 
File Transfer Protocol Parse - rtp__ftpjparse 
10 FTP is parsed as specified in RFC 959. 

Telnet Protocol Parse - rtp_telnet__parse 

The Telnet protocol is parsed as specified in RFC 

854. 

Network File System Protocol Parse - rpt_nfs_parse 
15 The NFS protocol is parsed as specified in RFC 

1094. 

The RTP calls routines supplied by STATS to look 
up data structures. By calling these lookup routines, 
global pointers to data structures are set up. Following 
20 are examples of the pointers to statistics data 

structures that are set up when parse routines call 
Statistics module lookup routines. 

mac_segment, mac_dst_segment, mac_this_segment , 

mac src, mac_dst, mac__dialog 
25 ip_src_segment, ip_dst_segment , ip_this_segment , 

ip_src, ip_dst, ip_dialog 

tcp_src__segment, tcp_dst_segment , 

tcp_this_segment , 

tcp_src, tcp_dst, tcp_src_socket, tcp_dst_socket , 
3 o tcp_connection 

The mac_src and mac_dst routines return pointers 
to the data structures within STATS for the source MAC 
address and the destination MAC address, respectively. 
The lookup_mac_dialog routine returns a pointer to the 
35 data structure within STATS for the dialog between the 
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two nodes on the MAC layer. The other STATS routines 
supply similar pointers for data structures relevant to 
other protocols. 

The RTP routines are aware of the names of the 
5 statistics that must be manipulated within the data base 
(e.g. frames, bytes) but are not aware of the structure 
of the data. When a statistic is to be manipulated, the 
RTP routine invokes a macro which manipulates the 
appropriate statistics in data structures. The macros 
10 use the global pointers which were set up during the 
lookup process described above. 

After a frame has been parsed (whether the parse 
was successful or not) , the RTP routine examines the 
destination mac and ip addresses, if either of the 
15 addresses is that of the Network Monitor, RTP obtains a 
low priority ITM, initializes it, and sends the ITM to 
the MTM task. One of the fields of the ITM contains the 
address of the buffer containing the frame. 

The RTP must hand some received frames to the EM 
20 in order to accomplish the autotopology function 
(described later) . After a frame has been parsed 
(whether the parse was successful or not) , the RTP 
routine examines the source mac and ip addresses, if 
either of the addresses is that of another Network 
25 Monitor, RTP obtains a low priority ITM, initializes it 
and sends the ITM to the EM task. The address data 
structure (in particular, the flags field of the parse 
control record) within STATS for the MAC or the IP 
address indicates whether the source address is that of 
30 another Network Monitor. One of the fields of the ITM 
contains the address of the buffer containing the frame. 

The RTP receives traffic frames from the network 
for analysis. RTP operation may be modified by sending 
control messages to the Monitor. RTP first parses these 
messages, then detects that the messages are destined for 
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the Monitor and passes them to the MTM task. Parameters 
which affect RTP operation may be changed by such control 
messages . 

the general operation of the RTP upon receipt of a 
5 traffic frame is as follows: 

Get next frame from input queue 

get address records for these stations 

For each level of active parsing 

{ 

10 get pointer to start of protocol header 

call layer parse routine 
determine protocol at next level 
set pointer to start of next layer protocol 



15 



}end of frame parsing 

if this is a monitor command add to MTM input 
queue 

if this frame is from another monitor, pass 
to EM 

check for overload -if yes tell control 

20 The state Machine; 

In the described embodiment, the state machine 
determines and keeps state for both addresses of all TCP 
connections. TCP is a connection oriented transport 
protocol, and TCP clearly defines the connection in terms 

25 of states of the connection. There are other protocols 
which do not explicitly define the communication in terms 
of state, e.g. connectionless protocols such as NFS. 
Nevertheless, even in the connectionless protocols there 
is implicitly the concept of state because there is an 

30 expected order to the events which will occur during the 
course of the communication. That is, at the very least, 
one can identify a beginning and an end of the 
communication, and usually some sequence of events which 
will occur during the course of the communication. Thus, 



even though the described embodiment involves a 
connection oriented protocol, the principles are 
applicable to many connectionless protocols or for that 
matter any protocol for which one can identify a 
5 beginning and an end to the communication under that 
protocol . 

Whenever a TCP packet is detected, the RTP parses 
the information for that layer to identify the event 
associated with that packet. It then passes the 

10 identified event along with the dialog identifier to the 
state machine. For each address of the two parties to 
the communication, the state machine determines what the 
current state of the node is. The code within the state 
machine determines the state of a connection based upon a 

15 set of rules that are illustrated by the event/ state 
table shown in Pig. 8. 

The interpretation of the event/ state table is as 
follows. The top row of the table identifies the six 
possible states of a TCP connection. These states are 

20 not the states defined in the TCP protocol specification. 
The left most column identifies the eight events which 
may occur during the course of a connection. Within the 
table is an array of boxes, each of which sits at the 
intersection of a particular event /state combination. 

25 Each box specifies the actions taken by the state machine 
if the identified event occurs while the connection is in 
the identified state. When the state machine receives a 
new event, it may perform three types of action. It may 
change the recorded state for the node. The state to 

3 0 which the node is changed is specified by the S= ,I STATE" 
entry located at the top of the box. It may increment or 
decrement the appropriate counters to record the 
information relevant to that event's occurrence. (In the 
table, incrementing and decrementing are signified by the 

35 ++ and the — symbols, respectively, located after the 
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15 



identity of the variable being updated.) Or the state 
machine may take other actions such as those specified xn 
the table as start close timer, Look_f or_Data_State , or 
Look_at_History (to be described shortly) . The 
particular actions which the state machine takes are 
specified in each box. An empty box indicates that no 
action is taken for that particular event/ state 
combination. Note, however, that the occurrence of an 
event is also likely to have caused the update of 
statistics within STATS , if not by the state machine, 
then by some other part of the RTP. Also note that xt 
may be desirable to have the state machine record other 
events, in which case the state table would be modified 
to identify those other actions. 

Two events appearing on the table deserve further 
explanation, namely, close timer expires and inactivity 
timer expires. The close timer, which is specified by 
TCP, is started at the end of a connection and it 
establishes a period during which any old packets for the 
connection which are received are thrown away (i.e., 
ignored) . The inactivity timer is not specified by TCP 
but rather is part of the Network Monitor's resource 
management functions. Since keeping statistics for 
dialogs (especially old dialogs) consumes resources, xt 
is desirable to recycle resources for a dialog if no 
activity has been seen for some period of time. The 
inactivity timer provides the mechanism for accomplishxng 
this It is restarted each time an event for the 
connection is received. If the inactivity timer expires 
30 (i e., if no event is received before the timer perxod 
ends) , the connection is assumed to have gone inactive 
and all of the resources associated with the dialog are 
recycled. This involves freeing them up for use by other 
dialogs. 
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The other states and events within the table 
differ from but are consistent with the definitions 
provided by TCP and should be self evident in view of 
that protocol specification. 

The event/ state table can be read as follows. 
Assume, for example, that node l is in DATA state and the 
RTP receives another packet from node 1 which it 
determines to be a TCP FIN packet. According to the 
entry in the table at the intersection of FIN/DATA (i.e., 
event/state) , the state machine sets the state of the 
connection for node l to CLOSING, it decrements the 
active connections counter and it starts the close timer. 
When the close timer expires, assuming no other events 
over that connection have occurred, the state machine 
sets node l*s state to CLOSED and it starts the 
inactivity timer. If the RTP sends another SYN packet to 
reinitiate a new connection before the inactive timer 
expires, the state machine sets node l f s state to 
CONNECTING (see the SYN/ CLOSED entry) and it increments 
an after close counter. 

When a connection is first seen, the Network 
Monitor sets the state of both ends of the connection to 
UNKNOWN state. If some number of data and acknowledgment 
frames are seen from both connection ends, the states of 
the connection ends may be promoted to DATA state. The 
connection history is searched to make this determination 
as will be described shortly. 

Referring to Figs. 9a-b, within STATS there is a 
history data structure 200 which the state machine uses 
to remember the current state of the connection, the 
state of each of the nodes participating in the 
connection arid a short history of state related 
information. History data structure 200 is identified by 
a state_pointer found at the end of the associated dialog 
statistics data structure in STATS (see Fig. 7c) . Within 
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history data structure 200, the state machine records the 
current state of node 1 (field 202), the current state of 
node 2 (field 206) and other data relating to the 
corresponding node (fields 204 and 208) . The other data 
5 includes, for example, the window size for the receive 
and transmit communications, the last detected sequence 
numbers for the data and acknowledgment frames, and other 
data transfer information. 

History data structure 200 also includes a history 
10 table (field 212) for storing a short history of events 
which have occurred over the connection and it includes 
an index to the next entry within the history table for 
storing the information about the next received event 
(field 210) . The history table is implemented as a 
15 circular buffer which includes sufficient memory to 
store, for example, 16 records. Each record, shown in 
Fig. 9b, stores the state of the node when the event was 
detected (field 218), the event which was detected (i.e., 
received) (field 220), the data field length (field 222), 
20 the sequence number (field 224), the acknowledgment 
sequence number (field 226) and the identity of the 
initiator of the event, i.e., either node 1 or node 2 or 
0 if neither (field 228). 

Though the Network Monitor operates in a 
25 promiscuous mode, it may occasionally fail to detect or 
it may, due to overload, lose a packet within a 
communication. If this occurs the state machine may not 
be able to accurately determine the state of the 
connection upon receipt of the next event. The problem 
30 is evidenced by the fact that the next event is not what 
was expected. When this occurs, the state machine tries 
to recover state by relying on state history information 
stored in the history table in field 212 to deduce what 
the state is. To deduce the current state from 
historical information, the state machine uses one of the 
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two previously mentioned routines, namely, 
Look_for_Data_State and Look_at_History . 

Referring to Fig. 10, Look_for_Data_State routine 
230 searches back through the history one record at a 
5 time until it finds evidence that the current state is 
DATA state or until it reaches the end of the circular 
buffer (step 232). Routine 230 detects the existence of 
DATA state by determining whether node l and node 2 each 
have had at least two data events or two acknowledgment 
10 combinations with no intervening connect, disconnect or 
abort events (step 234) . if such a sequence of events is 
found within the history, routine 230 enters both node l 
and node 2 into DATA state (step 236), it increments the 
active connections counter (step 238) and then it calls a 
15 Look_for_mitiator routine to look for the initiator of 
the connection (step 240) . if such a pattern of events 
is not found within the history, routine 230 returns 
without changing the state for the node (step 242) . 

As shown in Pig. n, Look_for_lnitiator routine 
20 240 also searches back through the history to detect a 
telltale event pattern which identifies the actual 
initiator of the connection (step 244) . More 
specifically, routine 240 determines whether nodes l and 
2 each sent connect-related packets. If they did, 
25 routine 240 identifies the initiator as the first node to 
send a connect-related packet (step 246) . if the search 
is not successful, the identity of the connection 
initiator remains unknown (step 248) . 

The Look_at_History routine is called to check 
30 back through the history to determine whether data 
transmissions have been repeated, in the case of 
retransmissions, the routine calls a 

Look_for_Retransmission routine 250, the operation of 
which is shown in Pig. 12. Routine 250 searches back 
35 through the history (step 252) and checks whether the 
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same initiator node has sent data twice (step 254). It 
detects this by comparing the current sequence number of 
the packet as provided by the RTP with the sequence 
numbers of data packets that were previously sent as 
5 reported in the history table. If a retransmission is 
spotted, the retransmission counter in the dialog 
statistics data structure of STATS is incremented (step 
256) . If the sequence number is not found within the 
history table, indicating that the received packet does 

10 not represent a retransmission, the retransmission 
counter is not incremented (step 258) . 

Other statistics such as window probes and keep 
alives may also be detected by looking at the received 
frame, data transfer variables, and, if necessary, the 

15 history. 

Even if frames are missed by the Network Monitor, 
because it is not directly "shadowing" the connection, 
the Network Monitor still keeps useful statistics about 
the connection. If inconsistencies are detected the 
20 Network Monitor counts them and, where appropriate, drops 
back to UNKNOWN state. Then, the Network Monitor waits 
for the connection to stabilize or deteriorate so that it 
can again determine the appropriate state based upon the 

history table. 
25 PTHnrHnal Ty^^inn^ of Network Monitor MOdAlles ; 

The transactions which represent the major portion 

of the processing load within the Monitor, include 

monitoring, actions on threshold alarms, processing 

database get/ set requests from the Management 
30 Workstation, and processing monitor control requests from 

the Management Workstation. Each of these mechanisms 

will now be briefly described. 

Monitoring involves the message sequence shown in 

Fig. 13. In that figure, as in the other figures 
35 involving message sequences, the numbers under the 
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heading SEQ. identify the major steps in the sequence. 
The following steps occur: 

1. ISR puts Received traffic frame ITM on RTP input 
queue 

5 2. request address of pertinent data structure from 
STATS (get parse control record for this station) 

3. pass pointer to RTP 

4. update statistical objects by call to statistical 
update routine in STATS using pointer to pertinent 

10 data structure 

5. parse completed - release buffers 

The major steps which follow a statistics 
threshold event (i.e., an alarm event) are shown in Fig. 
14. The steps are as follows: 
15 l. statistical object update causes threshold alarm 

2. STATS generates threshold event ITM to event 
manager (EM) 

3. look up appropriate action for this event 

4. perform local event processing 

20 5. generate network alarm ITM to MTM Xmit (if 
required) 

6. format network alarm trap for Workstation from 
event manager data 

7. send alarm to Workstation 

25 The major steps in processing of a database update 

request (i.e., a get /set request) from the Management 
Workstation are shown in Fig. 15. The steps are as 
follows: 

1. LAN ISR receives frame from network and passes it 
30 to RTP for parsing 

2. RTP parses frame as for any other traffip on 
segment. 

3. RTP detects frame is for monitor and sends 
received Workstation message over LAN ITM to MTM 

3 5 Recv . 
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4. MTM Recv processes protocol stack. 

5*. MTM Recv sends database update request ITM to EM. 

6. EM calls STATS to do database read or database 

write with appropriate IMPB 
5 7. STATS performs database access and returns 

response to EM. 

8. EM encodes response to workstation and sends 
database update response ITM to MTM Xmit 

9. MTM Xmit transmits. 

10 The major steps in processing of a monitor control 

request from the Management Workstation are shown in Fig. 
16. The steps are as follows: 

1. Lan ISR receives frame from network and passes 
received frame ITM to RTP for parsing. 
15 2. RTP parses frame as for any other traffic on 
segment . 

3. RTP detects frame is for monitor and sends 

received workstation message over LAN ITM to MTM 
Recv. 

20 4. MTM Recv processes protocol stack and decodes 
workstation command. 

5. MTM Recv sends request ITM to EM. 

6. EM calls control with monitor control IMPB. 

7. control performs requested operation and generates 
25 response to EM. 

b. EM sends database update response ITM to MTM Xmit. 
9. MTM Xmit encodes response to Workstation and 
transmits . 
Tfr° ttnmtof /workstation Interface; 
30 The interface between the Monitor and the 

Management Workstation is based on the SNMP definition 
(RFC 1089 SNMP; RFC 1065 SMI; RFC 1066 SNMP MIB - Note: 
RFC means Request for Comments) . All five SNMP PDU types 
are supported: 
3 5 get-r eque st 



get-next-request 
get-response 
set-request 
trap 

5 The SNMP MIB extensions are designed such that where 

possible a user request for data maps to a single complex 
MIB object. In this manner, the get-request is simple 
and concise to create, and the response should contain 
all the data necessary to build the screen. Thus, if the 

10 user requests the IP statistics for a segment this maps 
to an IP Segment Group. 

The data in the Monitor is keyed by addresses 
(MAC, IP) and port numbers (telnet, FTP) . The user may 
wish to relate his data to physical nodes entered into 

15 the network map. The mapping of addresses to physical 
nodes is controlled by the user (with support from the 
Management Workstation system where possible) and the 
Workstation retains this information so that when a user 
requests data for node 'Joe' the Workstation asks the 

20 Monitor for the data for the appropriate address (es) . 
The node to address mapping need not be one to one. 

Loading and dumping of monitors uses TFTP (Trivial 
File Transfer Protocol) . This operates over UDP as does 
SNMP. The Monitor to Workstation interface follows the 

25 SNMP philosophy of operating primarily in a polled mode. 
The Workstation acts as the master and polls the Monitor 
slaves for data on a regular (configurable) basis. 

The information communicated by the SNMP is 
represented according to that subset of ASN.i (ISO 8824 

30 Specification of ASN.i) defined in the Internet standard 
Structure of Management Information (SMI - RFC 1065) . 
The subset of the standard Management Information Base 
(MIB) (RFC 1066 SNMP MIB) which is supported by the 
Workstation is defined in Appendix III. The added value 

35 provided by the Workstation is encoded as enterprise 



WO 92/19054 



PCT7US92/02995 



' - 52 - 

specific extensions to the MIB as defined in Appendix IV. 
The format for these extensions follows the SMI 
recomendations for object identifiers so that the 
workstation extensions fall in the subtree 
5 I.3.6.I.4.I.X.I. where x is an enterprise specific node 
identifier assigned by the IAB. 

Appendix V is a summary of the network variables 
for which data is collected by the Monitor for the 
extended MIB and which can be retrieved by the 
io workstation. The summary includes short decriptions of 
the meaning and significance of the variables, where 
appropriate . 

Ttlf f«» n^inent Workstation: 

The Management Workstation is a SUN Sparcstatxon 

15 (also referred to as a Sun) available from Sun 

Microsystems, Inc. It is running the Sun flavor of Unix 
and uses the Open Look Graphical User Interface (GUI) and 
the SunNet Manager as the base system. The options 
required are those to run SunNet Manager with some 

20 additional disk storage requirement. 

The network is represented by a logical map 
illustrating the network components and the relationships 
between them, as shown in Fig. 17. A hierarchical 
network map is supported with navigation through the 

25 layers of the hierarchy, as provided by SNM. The 

Management Workstation determines the topology of the 
network and informs the user of the network objects and 
their connectivity so that he can create a network map. 
To assist with the map creation process, the Management 

30 Workstation attempts to determine the stations connected 
to each LAN segment to which a Monitor is attached. 
Automatic determination of segment topology by detecting 
stations is performed using the autotopology algorithms 
as described in copending U.S. Patent Application S.N. 

35 ***,*** entitled "Automatic Topology Monitor for Multi- 



Segment Local Area Network" filed on January 14 , 1991 
(Attorney Docket No. 13283-NE. APP) , incorporated herein 
by reference. 

In normal operation, each station in the network 
is monitored by a single Monitor that is located on its 
local segment. The initial determination of the Monitor 
responsible for a station is based on the results of the 
autotopology mechanism. The user may override this 
initial default if required. 

The user is informed of new stations appearing on 
any segment in the network via the alarm mechanism. As 
for other alarms, the user may select whether stations 
appearing on and disappearing from the network segment 
generate alarms and may modify the times used in the 
aging algorithms. When a new node alarm occurs, the user 
must add the new, alarm to the map using the SNM tools. 
In this manner, the SNM system becomes aware of the 
nodes . 

The sequence of events following the detection of 
a new node is: 

1. the location of the node is determined 
automatically for the user. 

2. the Monitor generates an alarm for the 
user indicating the new node and providing 
some or all of the following information: 

mac address of node 
ip address of node 

segment that the node is believed to 
be 

located on 
Monitor to be responsible for the 
node 

3 . the user must select the segment and add 
the node manually using the SNM editor 
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4. The update to the SNM database will be 
detected and the file reread. The 
Workstation database is reconstructed and 
the parse control records for the Monitors 
updated if required. 

5. The Monitor responsible for the new node 
has its parse control record updated via 
SNMP set request (s) . 

An internal record of new nodes is required for 
the autotopology. When a new node is reported by a 
Network Monitor, the Management Workstation needs to have 
the previous location information in order to know which 
Network Monitors to involve in autotopology. For 
example, two nodes with the same IP address may exist in 
15 separate segments of the network. The history makes 
possible the correlation of the addresses and it makes 
possible duplicate address detection. 

Before a new Monitor can communicate with the 
Management Workstation via SNMP it needs to be added to 
20 the SNM system files. As the SNM files are cached in the 
database, the file must be updated and the SNM system 

forced to reread it. 

Thus, on the detection of a new Monitor the 
following events need to occur in order to add the 
25 Monitor to the Workstation: 

1. The Monitor issues a trap to the 

Management Workstation software and 
requests code to be loaded from the Sun 
Microsystems boot/ load server. 
30 2. The code load fails as the Monitor is not 
known to the unix networking software at 
this time. 

3. The Workstation confirms that the new 
Monitor does not exceed the configured 
35 system limits (e.g. 5 Monitors per 
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Workstation) and terminates the 
initialization sequence if Units are 
exceeded. An alarm is issued to the user 
indicating the presence of the new Monitor 
5 and whether it can be supported. 

4. The user adds the Monitor to the 

SNMP. HOSTS file of the SNM system, to the 
etc/hosts file of the Unix networking 
system and to the SNM map. 
10 5. When the files have been updated the user 
resets the Monitor using the set tool 
(described later) . 

6. The Monitor again issues a trap to the 
Management Workstation software and 
requests code to be loaded from the Sun 
boot/ load server. 

7. The code load takes place and the Monitor 
issues a trap requesting data from the 
Management Workstation. 

20 8. The Monitor data is issued using SNMP set 
requests . 

Note that on receiving the set request, the SNMP proxy 
rereads in the (updated) SNMP. HOSTS file which now 
includes the new Monitor. Also note that the SNMP hosts 
25 file need only contain the Monitors, not the entire list 
of nodes in the system. 

9. On completion of the set request (s) the Monitor 
run command is issued by the Workstation to bring 
the Monitor on line. 
30 The user is responsible for entering data into the 

SNM database manually. During operation, the Workstation 
monitors the file write date for the SNM database. When 
this is different from the last date read, the SNM 
database is reread and the Workstation database 
35 reconstructed, in this manner, user updates to the SNM 
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database are incorporated into the Workstation database 
as quickly as possible without need for the user to take 
any action. 

When the Workstation is loaded, the database is 
5 created from the data in the SNM file system (which the 
user has possibly updated) . This data is checked for 
consistency and for conformance to the limits imposed by 
the workstation at this time and a warning is generated 
to the user if any problems are seen. If the data errors 

10 are minor the system continues operation; if they are 
fatal the user is asked to correct them and Workstation 
operation terminates. 

The monitoring functions of the Management 
Workstation are provided as an extension to the SNM 

15 system. They consist of additional display tools (i.e., 
summary tool, values tool, and set tool) which the user 
invokes to access the Monitor options and a Workstation 
event log in which all alarms are recorded. 

As a result of the monitoring process, the Monitor 

20 makes a large number of statistics available to the 
operator. These are available for examination via the 
Workstation tools that are provided. In addition, the 
Monitor statistics (or a selected subset thereof) can be 
made visible to any SNMP manager by providing it with 

25 knowledge of the extended MIB. A description of the 
statistics maintained are described elswhere. 

Network event statistics are maintained on a per 
network, per segment and per node basis. Within a node, 
statistics are maintained on a per address (as 

30 appropriate to the protocol layer - IP address, port 
number, ...) and per connection basis. Per network 
statistics are always derived by the Workstation from the 
per segment variables maintained by the Monitors. 
Subsets ot the basic statistics are maintained on a node 
35 to node and segment to segment basis. 
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If the user requests displays of segment to 
segment traffic, the Workstation calculates this data as 
follows. The inter segment traffic is derived from the 
node to node statistics for the intersecting set of 
5 nodes. Thus, if segment A has nodes 1, 2, and 3 and 

segment B has nodes 20, 21, and 22, then summing the node 
to node traffic for 

1 -> 20,21,22 

2 -> 20,21,22 
10 3 -> 20,21,22 

produces the required result. on-LAN/of f-LAN traffic for 
segments is calculated by a simply summing node to node 
traffic for all stations on the LAN and then subtracting 
this from total segment counts. 

15 Alarms are reported to the user in the following 

ways: 

1. Alarms received are logged in a Workstation log. 

2. The node which the alarm relates to is highlighted 
on the map. 

20 3. The node status change is propagated up through 

the (map) hierarchy to support the case where the 
node is not visible on the screen. This is as 
provided by SNM. 
Summary Tool 

25 After the user has selected an object from the map 

and invokes the display tools, the summary tool generates 
the user's initial screen at the Management Workstation. 
It presents a set of statistical data selected to give an 
overview of the operational status of the object (e.g., a 

30 selected node or segment) . The Workstation polls the 

Monitor for the data required by the Summary Tool displav 
screens. 

The Summary Tool displays a basic summary tool 
screen such as is shown in Fig. 18. The summary tool 
35 screen has three panels, namely, a control panel 602, a 
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values panel 604 , and a dialogs panel 606. The control 
panel includes the indicated mouse activated bottons. 
The functions of each of the buttons is as follows. The 
file button invokes a traditional file menu. The view 
5 button invokes a view menu which allows the user to 

modify or tailor the visual protperties of the tool. The 
properties button invokes a properties menu containing 
choices for viewing and sometimes modifying the 
properties of objects. The tools button invokes a tools 

10 menu which provides access to the other Workstation 
tools, e.g. Values Tool. 

The Update Interval field allows the user to 
specify the frequency at which the displayed statistics 
are updated by polling the Monitor. The Update Once 

15 button enables the user to retrieve a single screen 

update. When the Update Once button is invoked not only 
is the screen updated but the update interval is 
automatically set to "none". 

The type field enables the user to specify the 

20 type of network objects on which to operate, i.e., 
segment or node. 

The name button invokes a pop up menu containing 
an alphabetical list of all network objects of the type 
selected and apply and reset buttons. The required name 

25 can then be selected from the (scrolling) list and it 
will be entered in the name field of the summary tool 
when the apply button is invoked. Alternatively, the 
user may enter the name directly in the summary tool name 
field. 

30 The protocol button invokes a pop up menu which 

provides an exclusive set of protocol layers which the 
user may select. Selection of a layer copies the layer 
name into the displayed field of the summary tool when 
the apply operation is invoked. An example of a protocol 

35 selection menu is shown in Fig. 19. It displays the 
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available protocols in the form of a protocol tree with 
multiple protocol faailes. The protocol selection is two 
dimensional. That is, the user first selects the 
protocol family and then the particular layer within that 
5 family. 

As indicated by the protocol trees shown in Fig. 
19, the capabilities of the Monitor can be readily 
extended to handle other protocol families. The 
particular ones which are implemented depend upon the 
10 needs of the particular network environment in which the 
Monitor will operate. 

The user invokes the apply button to indicate that 
the selection process is complete and the type, name, 
protocol, etc. should be applied. This then updates the 
15 screen using the new parameter set that the user 
selected. The reset button is used to undo the 
selections and restore them to their values at the last 
apply operation. 

,» «. < J? 6 ° f 8tatl8tic8 fo * th. selected parameter 
20 set is displayed in values panel 604. The members of the 
sets differ depending upon, for example, what protocol 
was selected. Pigs. 20a-g present examples of the types 
rl S r t * ti8tlCal varia *les which are displayed for the DLL, 
IP, UDP, TCP, icmp, NFS, and ARP/RARP protocols 
respectively. The meaning of the values display fields 
are described in Appendix I, attached hereto. 

Dialogs panel 606 contains a display of the 
connection statistics for all protocols for a selected 
node. Within the Management Workstation, connection 
lists are maintained per node, per supported protocol. 
When connections are displayed, they are sorted on "Last 
Seen- with the most current displayed first, a single 
list returned from the Monitor contains all current 
connection. For TCP, however, each connection also 
contains a state and TCP connections are displayed as 
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Past and Present based upon the returned state of the 
connection. For certain dialogs, such as TCP and NFS 
over UDP, there is an associated direction to the dialog, 
i.e., from the initiator (source) to the receiver (sink). 
5 For these dialogs, the direction is identified in a DIR. 
field. A sample of information that is displayed in 
dialogs panel 606 is presented in Fig. 21 for current 
connections . 
Values Tool 

10 The values tool provides the user with the abilxty 

to look at the statistical database for a network object 
in detail. When the user invokes this tool, he may 
select a basic data screen containing a rate values panel 
620, a count values panel 622 and a protocols seen panel 
15 626, as shown in Fig. 22, or he may select a traffic 
matrix screen 628, as illustrated in Fig. 23. 

In rate values and count values panels 620 and 
622, value tools presents the monitored rate and count 
statistics, respectively, for a selected protocol. The 
20 parameters which are displayed for the different 

protocols (i.e., different groups) are listed in Appendix 
II. in general, a data element that is being displayed 
for a node shows up in three rows, namely, a total for 
the data element, the number into the data element, and 
25 the number out of the data element. Any exceptions to 
this are identified in Appendix II. Data elements that 
are displayed for segments, are presented as totals only, 
with no distinction between Rx and Tx. 

When invoked the Values Tool displays a primary 
30 screen to the user. The primary screen contains what is 
considered to be the most significant information for the 
selected object. The user can view other information for 
the object (i.e., the statistics for the other 
parameters) by scrolling down. 
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The displayed information for the count values and 
rate values panels 620 and 622 includes the following. 
An alarm field reports whether an alarm is currently 
active for this item. It displays as ••*'• if active alarm 
5 is present. A Current Value /Rate field reports the 

current rate or the value of the counter used to generate 
threshold alarms for this item. This is reset following 
each threshold trigger and thus gives an idea of how 
close to an alarm threshold the variable is. A Typical 
10 Value field reports what this item could be expected to 
read in a "normal" operating situation. This field is 
filled in for those items where this is predictable and 
useful. It is maintained in the Workstation database and 
is modifiable by the user using the set tool. An 
15 Accumulated Count field reports the current accumulated 
value of the item or the current rate. A Max Value field 
reports the highest value recently seen for the item. 
This value is reset at intervals defined by a user 
adjustable parameter (default 30 minutes) . This is not a 
20 rolling cycle but rather represents the highest value 
since it was reset which may be from l to 30 minutes ago 
(for a rest period of 30 minutes) . It is used only for 
rates. A Min Value field reports the lowest value 
recently seen for the item. This operates in the same 
25 manner as Max Value field and is used only for rates. 

A Percent (%) field reports only for the following 
variables : 

off seg counts: 

100 (in count / total off seg count) 
30 100 (out count / total off seg count) 

100 (transit count / total off seg count) 
100 (local count / total off seg count) 
off seg rates 

100 (transit rate / total off seg rate), etc. 
35 protocols 
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100 (frame rate this protocol / total frame 
rate) 

On the right half of the basic display, there the 
following additional fields; a High Threshold field and a 
Sample period for rates field. 
Set Tool 

The set tool provides the user with the ability to 
modify the parameters controling the operation of the 
Monitors and the Management Workstation. These 
parameters affect both user interface displays and the 
actual operation of the Monitors. The parameters which 
can be operated on by the set tool can be divided into 
the following categories: alarm thresholds, monitoring 
control, segment Monitor administration, and typical 
15 values. 

The monitoring control variables specify the 
actions of the segment Monitors and each Monitor can have 
a distinct set of control variables (e.g., the parse 
control records that are described elsewhere) . The user 
is able to define those nodes, segments, dialogs and 
protocols in which he is interested so as to make the 
best use of memory space available for data storage. 
This mechanism allows for load sharing, where mulitple 
Monitors on the same segment can divide up the total 
number of network objects which are to be monitored so 
that no duplication of effort between them takes place. 

The monitor administration variables allow the 
user to modify the operation of the segment Monitor in a 
more direct manner than the monitoring control variables. 
Using the set tool, the user can perform those operations 
such as reset, time changes etc. which are normally the 
prerogative of a system administrator. 

Note that the above descriptions of the tools 
available through the Management Workstation are not 
35 meant to imply that other choices may not be made 
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regarding the particular information which is displayed 
and the manner in which it is displayed. 
Actively SPftina Network M onitor- Tlira c h?1 ^. 

The workstation sets the thresholds in the Network 
5 Monitor based upon the performance of the system as 
observed over an extended period of time. That i S/ the 
Workstation periodically samples the output of the 
Network Monitors and assembles a model of a normally 
functioning network. Then, the Workstation sets the 
10 thresholds in the Network Monitors based upon that model. 
If the observation period is chosen to be long enough and 
since the model represents the "average- of the network 
performance over the observation period, temporary 
undesired deviations from normal behavior are smoothed 
out over time and model tends to accurately reflect 
normal network behavior. 

Referring the Fig. 24, the details of the training 
procedure for adaptively setting the Network Monitor 
thresholds are as follows. To begin training, the 
workstation sends a start learning command to the Network 
Monitors from which performance data is desired (step 
302) . The start learning command disables the thresholds 
within the Network Monitor and causes the Network Monitor 
to periodically send data for a predefined set of network 
parameters to the Management Workstation. (Disabling the 
thresholds, however, is not necessary, one could have 
the learning mode operational in parallel with monitoring 
using existing thresholds.) The set of parameters may be 
any or all of the previously mentioned parameters for 
30 which thresholds are or may be defined. 

Throughout the learning period, the Network 
Monitor sends "snapshots" of the network's performance to 
the Workstation which, in turn, stores the data in a 
performance history database 306 (step 304). The network 
35 manager sets the length of the learning period. 
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Typically, it should be long enough to include the full 
range of load conditions that the network experiences so 
that a representative performance history is generated. 
It should also be long enough so that short periods of 
5 overload or faulty behavior do not distort the resulting 

averages . . . 

After the learning period has expired, the network 

manager, through the Management Workstation, sends a stop 
learning command to the Monitor (step 308) . The Monxtor 
10 ceases automatically sending further performance data 
updates to the Workstation and the Workstation processes 
the data in its performance history database (step 310) . 
The processing may involve simply computing averages for 
the parameters of interest or it may involve more 
15 sophisticated statistical analysis of the data; such as 
computing means, standard deviations, maximum and minimum 
values, or using curve fitting to compute rates and other 
pertinent parameter values. 

After the Workstation has statistically analyzed 
20 the performance data, it computes a new set of thresholds 
for the relevant performance parameters (step 312) . To 
do this, it uses formulas which are appropriate to the 
particular parameter for which a threshold is being 
computed. That is, if the parameter is one for which one 
25 would expect to see wide variations in its value during 
network monitoring, then the threshold should be set high 
enough so that the normal expected variations do not 
trigger alarms. On the other hand, if the parameter is 
of a type for which only small variations are expected 
30 and larger variations indicate a problem, then the 

threshold should be set to a value that is close to the 
average observed value. Examples of formulae which may 
be used to compute thresholds are: 

* Highest value seen during learning period; 



- 65 - 

* Highest value seen during learning period + 
10%; 

* Highest value seen during learning period + 
50%; 

5 * Highest value seen during learning period + 

user-defined percent; 

* Any value of the parameter other than zero; 

* Average value seen during learning period + 
50%; and 

10 * Average value seen during learning period + 

user-defined percent. 
As should be evident from these examples, there is a 
broad range of possibilities regarding how to compute a 
particular threshold. The choice, however, should 

15 reflect the parameter's importance in signaling serious 
network problems and its normal expected behavior (as may 
be evidenced from the performance history acquired for 
the parameter during the learning mode) . 

After the thresholds are computed, the Workstation 

20 loads them into the Monitor and instructs the Monitor to 
revert to normal monitoring using the new thresholds 
(step 314) . 

This procedure provides a mechanism enabling the 
network manager to adapt ively reset thresholds in 

25 response to changing conditions on the network, shifting 
usage patterns and evolving network topology. As the 
network changes over time, the network manager merely 
invokes the adaptive threshold setting feature and 
updates the thresholds to reflect those changes. 

30 The Diagnostic An alyzer Module - 

The Management Workstation includes a diagnostic 
analyzer module which automatically detects and diagnoses 
the existence and cause of certain types of network 
problems. The functions of the diagnostic module may 

3 5 actually be distributed among the Workstation and the 
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Network Monitors which are active on the network, 
principle, the diagnostic analyzer module includes the 
following elements for performing its fault detection and 
analysis- functions. 
5 The Management Workstation contains a reference 

model of a normally operating network. The reference 
model is generated by observing the performance of the 
network over an extended period of time and computing 
averages of the performance statistics that were observed 
10 during the observation period. The reference model 
provides a reference against which future network 
performance can be compared so as to diagnose and analyze 
potential problems. The Network Monitor (in particular, 
the STATS module) includes alarm thresholds on a selected 
set of the parameters which it monitors. Some of those 
thresholds are set on parameters which tend to be 
indicative of the onset or the presence of particular 

network problems. 

During monitoring, when a Monitor threshold is 
exceeded, thereby indicating a potential problem (e.g. in 
a TCP connection) , the Network Monitor alerts the 
Workstation by sending an alarm. The Workstation 
notifies the user and presents the user with the option 
of either ignoring the alarm or invoking a diagnostic 
algorithm to analyze the problem. If the user invokes 
the diagnostic algorithm, the Workstation compares the 
current performance statistics to its reference model to 
analyze the problem and report its results. (Of course, 
this may also be handled automatically so as to not 
require user intervention.) The Workstation obtains the 
data on current performance of the network by retrieving 
the relevant performance statistics from all of the 
segment Network Monitors that may have information useful 
to diagnosing the problem. 
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The details of a specific example involving poor 
TCP connection performance will now be described. This 
example refers to a typical network on which the 
diagnostic analyzer resides, such as the network 
5 illustrated in Fig. 25. It includes three segments 

labelled si, S2, and S3, a router Ri connecting SI to S2, 
a router R2 connecting S2 to S3, and at least two nodes, 
node A on SI which communicates with node B on S3. On 
each segment there is also a Network Monitor 324 to 
10 observe the performance of its segment in the manner 

described earlier. A Management Workstation 320 is also 
located on Si and it includes a diagnostic analyzer 
module 322. For this example, the sympton of the network 
problem is degraded peformance of a TCP connection 
15 between Nodes A and B. 

A TCP connection problem may manifest itself in a 
number of ways, including, for example, excessively high 
numbers for any of the following: 
errors 

20 packets with bad sequence numbers 

packets retransmitted 
bytes retransmitted 
out of order packets 
out of order bytes 
25 packets after window closed 

bytes after window closed 
average and maximum round trip times 
or by an unusually low value for the current window size. 
By setting the appropriate thresholds, the Monitor is 
30 programmed to recognize any one or more of these 

symptons. If any one of of the thresholds is exceeded, 
the Monitor sends an alarm to the Workstation. The 
Workstation is programmed to recognize the particular 
alarm as related to an event which can be further 
35 analyzed by its diagnostic analyzer module 322. Thus, 
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the workstation presents the user with the option of 
invoking its diagnostic capabilities (or automatically 
invokes the diagnostic capabilities) . 

In general terms, when the diagnostic analyzer is 
5 invoked, it looks at the performance data that the 
segment Monitors produce for the two nodes, for the 
dialogs between them and for the links that interconnect 
them and compares that data to the reference model for 
the network. If a significant divergence from the 
10 reference model is identified, the diagnostic analyzer 
informs the Workstation (and the user) about the nature 
of the divergence and the likely cause of the problem. 
In conducting the comparison to "normal" network 
performance, the network circuit involved in 
15 communications between nodes A and B is decomposed into 
its individual components and diagnostic analysis is 
performed on each link individually in the effort to 
isolate the problem further . 

The overall structure of the diagnostic algorithm 
20 400 is shown in Fig. 26. When invoked for analyzing a 
possible TCP problem between nodes A and B, diagnostic 
analyzer 322 checks for a TCP problem at node A when it 
is acting as a source node (step 402). To perform this 
check, diagnostic algorithm 400 invokes a source node 
25 analyzer algorithm 450 shown in Fig. 27. If a problem is 
identified, the Workstation reports that there is a high 
probability that node A is causing a TCP problem when 
operating as a source node and it reports the results of 
the investigation performed by algorithm 450 (step 404). 
30 If node A does not appear to be experiencing a TCP 

problem when acting as a source node, diagnostic analyzer 
322 checks for evidence of a TCP problem at node B when 
it is acting as a sink node (step 406) . To perform this 
check, diagnostic algorithm 400 invokes a sink node 
35 analyzer algorithm 470 shown in Fig. 28. If a problem is 
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identified, the Workstation reports that there is a high 
probability that node B is causing a TCP problem when 
operating as a sink node and it reports the results of 
the investigation performed by algorithm 470 (stop 408) . 
5 Mote that source and sink nodes are concepts which 

apply to those dialogs for which a direction of the 
communication can be defined. For example, the source 
node may be the one which initiated the dialog for the 
purpose of sending data to the other node, i.e., the sink 
10 node . 

If node B does not appear to be experiencing a TCP 
problem when acting as a sink node, diagnostic analyzer 
322 checks for evidence of a TCP problem on the link 
between Node A and Node B (step 410) . To perform this 

15 check, diagnostic algorithm 400 invokes a link analysis 
algorithm 550 shown in Fig. 29. If a problem is 
identified, the Workstation reports that there is a high 
probability that a TCP problem exists on the link and it 
reports the results of the investigation performed by 

20 link analysis algorithm 550 (step 412) . 

If the link does not appear to be experiencing a 
TCP problem, diagnostic analyzer 322 checks for evidence 
of a TCP problem at node B when it is acting as a source 
node (step 414). To perform this check, diagnostic 

25 algorithm 400 invokes the previously mentioned source 
algorithm 450 for Node B. If a problem is identified, 
the Workstation reports that there is a medium 
probability that node B is causing a TCP problem when 
operating as a source node and it reports the results of 

30 the investigation performed by algorithm 450 (step 416) . 

If node B does not appear to be experiencing a TCP 
problem when acting as a source node, diagnostic analyzer 
322 checks for a TCP problem at node A when it is acting 
as a sink node (step 418) . To perform this check, 

35 diagnostic algorithm 400 invokes sink node analyzer 
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algorithm 470 for Node A. If a problem is identified, 
the Network Monitor reports that there is a medium 
probability that node A is causing a TCP problem when 
operating as a sink node and it reports the results of 
the investigation performed by algorithm 470 (step 420). 

Finally, if node A does not appear to be 
experiencing a TCP problem when acting as a sink node, 
diagnostic analyzer 322 reports that it was not able to 
isolate the cause of a TCP problem (step 422) . 

The algorithms which are called from within the 
above-described diagnostic algorithm will now be 
described. Referring to Fig. 27, source node analyzer 
algorithm 450 checks whether a particular node is causing 
a TCP problem when operating as a source node. The 
strategy is as follows. To determine whether a TCP 
problem exists at this node which is the source node for 
the TCP connection, look at other connections for which 
this node is a source. If other TCP connections are 
okay, then there is probably not a problem with this 
node. This is an easy check with a high probability of 
being correct. If no other good connections exist, then 
look at the lower layers for possible reasons. Start at 
DLL and work up as problems at lower layers are more 
fundamental, i.e., they cause problems at higher layers 
25 whereas the reverse is not true. 

in accordance with this approach, algorithm 450 
first determines whether the node is acting as a source 
node in any other TCP connection and, if so, whether the 
other connection is okay (step 452) . If the node is 
performing satisfactorily as a source node in another TCP 
connection, algorithm 450 reports that there is no 
problem at the source node and returns to diagnostic 
algorithm 400 (step 454) . If algorithm 450 cannot 
identify any other TCP connections involving this node 
that are okay, it moves up through the protocol stack 
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checking each level for a problem. In this case, it then 
checks for DLL problems at the node when it is acting as 
a source node by calling an DLL problem checking routine 
510 (see Fig. 30) (step 456). if a DLL problem is found, 
that fact is reported (step 458) . if no DLL problems are 
found, algorithm 450 checks for an IP problem at the node 
when it is acting as a source by calling an IP problem 
checking routine 490 (see Pig. 31) (step 460) . if an IP 
problem is found, that fact is reported (step 462). If 
no IP problems are found, algorithm 450 checks whether 
any other TCP connection in which the node participates 
as a source is not okay (step 464). if another TCP 
connection involving the node exists and it is not okay, 
algorithm 450 reports a TCP problem at the node (step 
15 466) . if no other TCP connections where the node is 
acting as a source node can be found, algorithm 450 
exits . 

Referring to Pig. 28, sink node analyzer algorithm 
470 checks whether a particular node is causing a TCP 
20 problem when operating as a sink node. It first 

determines whether the node is acting as a sink node in 
any other TCP connection and, if so, whether the other 
connection is okay (step 472). if the node is performing 
satisfactorily as a sink node in another TCP connection, 
25 algorithm 470 reports that there is no problem at the 

source node and returns to diagnostic algorithm 400 (step 
474). if algorithm 470 cannot identify any other TCP 
connections involving this node that are okay, it then 
checks for DLL problems at the node when it is acting cs 
a sink node by calling DLL problem checking routine 5io 
(step 476) . if a DLL problem is found, that fact is 
reported (step 478) . if no DLL problems are found, 
algorithm 470 checks for an IP problem at the node when 
it is acting as a sink by calling IP problem checking 
35 routine 490 (step 480) . if an IP problem is found, that 
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fact is reported (step 482). If no IP problems are 
found, algorithm 470 checks whether any other TCP 
connection in which the node participates as a sink is 
not okay- (step 484) . If another TCP connection involving 
5 the node as a sink exists and it is not okay, algorithm 
470 reports a TCP . problem at the node (step 486). If no 
other TCP connections where the node is acting as a sink 
node can be found, algorithm 470 exits. 

Referring to Fig.. 31, IP problem checking routine 
10 490 checks for IP problems at a node. It does this by 
comparing the IP performance statistics for the node to 
the reference model (steps 492 and 494). If it detects 
any significant deviations from the reference model, it 
reports that there is an IP problem at the node (step 
15 496) . If no significant deviations are noted, it reports 
that there is no IP problem at the node (step 498) . 

As revealed by examining Fig. 30, DLL problem 
checking routine 510 operates in a similar manner to IP 
problem checking routine 490, with the exception that it 
20 examines a different set of parameters (i.e., DLL 
parameters) for significant deviations. 

Referring the Fig. 29, link analysis logic 550 
first determines whether any other TCP connection for the 
link is operating properly (step 552) . if a properly 
25 operating TCP connection exists on the link, indicating 
that there is no link problem, link analysis logic 550 
reports that the link is okay (step 554) . If a properly 
operating TCP connection cannot be found, the link is 
decomposed into its constituent components and an IP link 
30 component problem checking routine 570 (see Fig. 32) is 
invoked for each of the link components (step 556) . IP 
link component problem routine 570 evaluates the link 
component by checking the IP layer statistics for the 
relevant link component. 



e 



The decomposition of the link into its components 
arranges them in order of their distance from the source 
node and the analysis of the components proceeds in that 
order. Thus, for example, the link components which make 
5 up the link between nodes A and B include in order: 

segment SI, router Rl, segment S2, router R2, and segment 
S3* The IP data for these various components are 
analyzed in the following order: 

IP data for segment Si 
10 IP data for address Rl 

IP data for source node to Rl 
IP data for SI to S2 
IP data for S2 
IP data for address R2 
15 IP data for S3 

IP data for S2 to S3 
IP data for SI to S3 
As shown in Fig. 32, IP link component problem 
checking routine 570 compares IP statistics for the link 
20 component to the reference model (step 572) to determine 
whether network performance deviates significantly from 
that specified by the model (step 574). If significant 
deviations are detected, routine 570 reports that there 
is an IP problem at the link component (step 576) . 
25 Otherwise, it reports that it found no IP problem (step 
578) . 

Referring back to Fig. 29, after completing the IP 
problem analysis for all of the link components, logic 
550 then invokes a DLL link component problem checking 
30 routine 580 (see Fig. 33) for each link component to 
check its DLL statistics (step 558) . 

DLL link problem routine 58 0 is similar to IP link 
problem routine 570. As shown in Fig. 33, DLL link 
problem checking routine 580 compares DLL statistics for 
3 5 the link to the reference model (step 582) to determine 
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whether network performance at the DLL deviates 
significantly from that specified by the model (step 
584). If significant deviations are detected, routine 
580 reports that there is a DLL problem at the link 
component (step 586) . Otherwise, it reports that no DLL 
problems were found (step 588) . 

Referring back to Fig. 29, after completing the 
DLL problem analysis for all of the link components, 
logic 550 checks whether there is any other TCP on the 
link (step 560) . If another TCP exists on the link 
(which implies that the other TCP is also not operating 
properly) , logic 550 reports that there is a TCP problem 
on the link (step 562) . Otherwise, logic 550 reports 
that there was not enough information from the existing 
15 packet traffic to determine whether there was a link 

problem (step 564) 

If the analysis of the link components does not 
isolate the source of the problem and if there were 
components for which sufficient information was not 
20 available (due possibly to lack of traffic over through 
that component) , the user may send test messages to those 
components to generate the information needed to evaluate 

its performance. 

The reference model against which comparisons 

25 are made to detect and isolate malfunctions may be 

generated by examining the behavior of the network over 
an extended period of operation or over multiple periods 
of operation. During those periods of operation, average 
values and maximum excursions (or standard deviations) 

30 for observed statistics are computed. These values 
provide an initial estimate of a model of a properly 
functioning system. As more experience with the network 
is obtained and as more historical data on the various 
statistics is accumulated the thresholds for detecting 

35 actual malfunctions or imminent malfunctions and the 
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reference model can be revised to reflect the new 
experience. 

What constitutes a significant deviation from the 
reference model depends upon the particular parameter 
5 involved. Some parameters will not deviate from the 

expected norm and thus any deviation would be considered 
to be significant, for example, consider ICMP messages of 
type "destination unreachable," IP errors, TCP errors. 
Other parameters will normally vary within a wide range 
10 of acceptable values, and only if they move outside of 
that range should the deviation be considered 
significant. The acceptable ranges of variation can be 
determined by watching network performance over a 
' sustained period of operation. 
15 The parameters which tend to provide useful 

information for identifying and isolating problems at the 
node level for the different protocols and layers include 
the following. 

XQE 

20 error rate 

header byte rate 

packets retransmitted 

bytes retransmitted 

packets after window closed 
25 bytes after window closed 

HBE 

error rate 
header byte rate 

IE 

30 error rate 

header byte rate 
fragmentation rate 

all ICMP messages of type destination 
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unreachable, parameter problem, 
redirection 



DLL 

error rate 



runts 



For diagnosing network segment problems, the above- 
identified parameters are also useful with the addition 
of the alignment rate and the collision rate at the DLL. 
All or some subset of these parameters may be included 
10 among the set of parameters which are examined during the 
diagnostic procedure to detect and isolate network 
problems . 

The above-described technique can be applied to a 
wide range of problems on the network, including among 
15 others, the following: 



DLL not working at all 
For each of these problems, the diagnostic approach would 
be similar to that described above, using, of course, 

25 different parameters to identify the potential problem 
and isolate its cause. 
The Event Timing Module 

Referring again to Fig- 5, the RTP is programmed 
to detect the occurrence of certain transactions for 

30 which timing information is desired. The transactions 
typically occur within a dialog at a particular layer of 
the protocol stack and they involve a first event (i.e., 
an initiating event) and a subsequent partner event or 
response. The events are protocol messages that arrive 



20 



TCP Connection fails to establish 

UDP Connection performs poorly 

UDP not working at all 

IP poor performance /high error rate 

IP not working at all 

DLL poor performance/high error rate 



at the Network Monitor, are parsed by the RTP and then 
passed to Event Timing Module (ETM) for processing. A 
transaction of interest might be, for example, a read of 
a file on a server. In that case, the initiating event 
5 is the read request and the partner event is the read 
response. The time of interest is the time required to 
receive a response to the read request (i.e., the 
transaction time) . The transaction time provides a 
useful measure of network performance and if measured at 
10 various times throughout the day under different load 
conditions gives a measure of how different loads affect 
network response times. The layer of the communicaton 
protocol at which the relevant dialog takes place will of 
course depend upon the nature of the event. 
15 In general, when the RTP detects an event, it 

transfers control to the ETM which records an arrival 
time for the event. If the event is an initiating event, 
the ETM stores the arrival time in an event timing 
database 300 (see Fig. 34) for future use. If the event 
20 is a partner event, the ETM computes a difference between 
that arrival time and an earlier stored time for the 
initiating event to determine the complete traneaction 
time. 

Event timing database 300 is an array of records 
25 302. Each record 302 includes a dialog field 304 for 
identifying the dialog over which the transactions of 
interest are occurring and it includes an entry type 
field 306 for identifying the event type of interest. 
Each record 302 also includes a start time field 308 for 
30 storing the arrival time of the initiating event and an 
average delay time field 310 for storing the computed 
average delay for the transactions. A more detailed 
description of the operation of the ETM follows. 

Referring to Fig. 35, when the RTP detects the 
35 arrival of a packet of the type for which timing 
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information is being kept, it passes control to the ETM 
along with relevant information from the packet, such as 
the dialog identifier and the event type (step 320) . The 
ETM then determines whether it is to keep timing 
5 information for that particular event by checking the 
event timing database (step 322). Since each event type 
can have multiple occurrences (i.e., there can be 
multiple dialogs at a given layer) , the dialog identifier 
is used to distinguish between events of the same type 
10 for different dialogs and to identify those for which 

information has been requested. All of the dialog/events 
of interest are identified in the event timing database. 
If the current dialog and event appear in the event 
timing database, indicating that the event should be 
15 timed, the ETM determines whether the event is a starting 
event or an ending event so that it may be processed 
properly (step 324) . For certain events, the absence of 
a start time in the entry field of the appropriate record 
302 in event timing database 300 is one indicator that 
20 the event represents a start time; otherwise, it is an 
end time event. For other events, the ETM determines if 
the start time is to be set by the event type as 
specified in the packet being parsed. For example, if 
the event is a file read a start time is stored. If the 
event is the read completion it represents an end time. 
In general, each protocol event will have its own 
intrinsic meaning for how to determine start and end 
times . 

Note that the arrival time is only an estimate of 
the actual arrival time due to possible queuing and other 
processing delays. Nevertheless, the delays are 
generally so small in comparison to the transaction times 
being measured that they are of little consequence. 

In step 324, if the event represents a start time, 
35 the ETM gets the current time from the kernal and stores 



25 



30 



it in start time field 308 of the appropriate record in 
event timing database 300 (step 326) . if the event 
represents an end time event, the ETM obtains the current 
time from the kernel and computes a difference between 
5 that time and the corresponding start time found in event 
timing database 300 (step 328) . This represents the 
total time for the transaction of interest. It is 
combined with the stored average transaction time to 
compute a new running average transaction time for that 

10 event (step 330) . 

Any one of many different methods can be used to 
compute the running average transaction time. For 
example, the following formula can be used: 

New Avg. = [(5 * Stored Avg. ) + Transaction 

15 Time]/6. 

After six transaction have been timed, the computed new 
average becomes a running average for the transaction 
times. The ETM stores this computed average in the 
appropriate record of event timing database 300, 
20 replacing the previous average transaction time stored in 
, that record, and it clears start time entry field 308 for 
that record in preparation for timing the next 
transaction . 

After processing the event in steps 322, 326, and 
25 330, the ETM checks the age of all of the start time 

entries in the event timing database 300 to determine if 
any of them are too "old" (step 332). If the difference 
between the current time and any of the start times 
exceeds a preselected threshold, indicating that a 
30 partner event has not occurred within a reasonable period 
of time, the ETM deletes the old start time entry for 
that dialog/event (step 334). This insures that a missed 
packet for a partner event does not result in an 
erroneously large transaction time which throws off the 
35 running average for that event. 
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If the average transaction time increases beyond a 
preselected threshold set for timing events, an alarm is 
sent to the Workstation. 

Two examples will now be described to illustrate 
the operation of the ETM for specific event types. In 
the first example, Node A of Fig. 25 is communicating 
with Node B using the NFS protocol. Node A is the client 
while Node B is the server. The Network Monitor resides 
on the same segment as node A, but this is not a 
requirement. When Node A issues a read request to Node 
B, the Network Monitor sees the request and the RTP 
within the Network Monitor transfers control to the ETM. 
Since it is a read, the ETM stores a start time in the 
Event Timing Database. Thus, the start time is the time 
15 at which the read was initiated. 

After some delay, caused by the transmission 
delays of getting the read message to node B, node B 
performs the read and sends a response back to node A. 
After some further transmission delays in returning the 
20 read response, the Network Monitor receives the second 
packet for the event. At the time, the ETM recognizes 
that the event is an end time event and updates the 
average transaction time entry in the appropriate record 
with a new computed running average. The ETM then 
25 compares the average transaction time with the threshold 
for this event and if it has been exceeded, issues an 
alarm to the Workstation. 

in the second example, node A is communicating 
with Node B using the Telnet protocol. Telnet is a 
30 virtual terminal protocol. The events of interest take 
place long after the initial connection has been 
established. Node A is typing at a standard ASCII (VT100 
class) terminal which is logically (through the network) 
connected to Node B. Node B has an application which is 
35 receiving the characters being typed on Node A and, at 
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appropriate times, indicated by the logic of the 
applications, sends characters back to the terminal 
located on Node A. Thus, every time node A sends 
characters to B, the Network Monitor sees the 
5 transmission. 

In this case, there are several transaction times 
which could provide useful network performance 
information. They include, for example, the amount of 
time it takes to echo characters typed at the keyboard 

10 through the network and back to the display screen, the 
delay between typing an end of line command and seeing 
the completion of the application event come back or the 
network delays incurred in sending a packet and receiving 
acknowledgment for when it was received. 

15 in this example, the particular time being 

measured is the ,time it takes for the network to send a 
packet and receive an acknowledgement that the packet has 
arrived, since Telnet runs on top of TCP, which in turn 
runs on top of IP, the Network Monitor monitors the TCP 

20 acknowledge end-to-end time delays. 

Note that this is a design choice of the 
implementation and that all events visible to the Network 
Monitor by virtue of the fact that information is in the 
packet could be measured. 

25 When Node A transmits a data packet to Node B, the 

Network Monitor receives the packet. The RTP recognizes 
the packet as being part of a timed transaction and 
passes control to the ETM. The ETM recognizes it as a 
start time event, stores the start time in the event 

30 timing database and returns control to the RTP after 
checking for aging. 

When Node B receives the data packet from Node A, 
it sends back an acknowledgment packet. When the Network 
Monitor sees that packet, it delivers the event to the 

35 ETM, which recognizes it as an end time event. The ETM 
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calculates the delay time for the complete transaction 
and uses that to update the average transaction time. 
The ETM then compares the new average transaction time 
with the threshold for this event. If it has been 
exceeded, the ETM issues an alarm to the Workstation. 

Note that this example is measuring something very 
different than the previous example. The first example 
measures the time it takes to traverse the network, 
perform an action and return that result to the 
requesting node. It measures performance as seen by the 
user and it includes delay times from the network as well 
as delay times from the File Server. 

The second example is measuring network delays 
without looking at the service delays. That is, the ETM 
is measuring the amount of time it takes to send a packet 
to a node and receive the acknowledgement of the receipt 
of the message. In this example, the ETM is measuring 
transmissions delays as well as processing delays 
associated with network traffic, but nop anything having 
20 to do with non-network processing. 

As can be seen from the above examples, the ETM 
can measure a broad range of events. Each of these 
events can be measured passively and without the 
cooperation of the nodes that are actually participating 
25 in the transmission. 

Th o ft^reas Tr»r-Ver Module fATMl 

Address tracker module (ATM) 43, one of the 
software modules in the Network Monitor (see Fig. 5) , 
operates on networks on which the node addresses for 
30 particular node to node connections are assigned 

dynamically. An Appletalk* Network, developed by Apple 
computer Company, is an example of a network which uses 
dynamic node addressing. In such networks, the dynamic 
change in the address of a particular service causes 
35 difficulty troubleshooting the network because the 
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network manager may not know where the various nodes are 
and what they are called. in addition, foreign network 
addresses (e.g., the IP addresses used by that ncda for 
communication over an IP network to which if is 
5 connected) can not be relied upon to point to a 

particular node. ATM 43 solves this problem by passively 
monitoring the network traffic and collecting a table 
showing the node address to node name mappings. 

In the following description, the network on which 
10 the Monitor is located is assumed to be an Appletalk* 
Network. Thus, as background for the following 
discussion, the manner in which the dynamic node 
addressing mechanism operates on that network will first 
be described. 

15 When a node is activated on the Appletalk* 

Network, it establishes its own node address in 
accordance with protocol referred to as the Local Link 
Access Protocol (LLAP) . That is, the node guesses its 
own node address and then verifies that no other node on 

20 the network is using that address. The node verifies the 
uniqueness of its guess by sending an LLAP Enquiry 
control packet informing all other nodes on the network 
that it is going to assign itself a particular address 
unless another node responds that the address has already 

25 been assigned. If no other node claims that address as 
its own by sending an LLAP acknowledgment control packet, 
the first node uses the address which it has selected. 
If another node claims the address as its own, the first 
node tries another address. This continues until, the 

30 node finds an unused address. 

When the first node wants to communicate with a 
second node, it must determine the dynamically assigned 
node address of the second node, it does this in 
accordance with another protocol referred to as the Name 

35 Binding Protocol (NBP) . The Name Binding Protocol is 
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used to map or bind human understandable node names with 
machine understandable node addresses. The NBP allows 
nodes to dynamically translate a string of characters 
(i.e., a node name) into a node address. The node 
needing to communicate with another node broadcasts an 
NBP Lookup packet containing the name for which a node 
address is being requested. The node having the name 
being requested responds with its address and returns a 
Lookup Reply packet containing its address to the 
original requesting node. The first node then uses that 
address its current communications with the second node. 

Referring to Fig. 36, the network includes an 
Appletalk* Network segment 702 and a TCP/IP segment 704, 
each of which are connected to a larger network 706 
15 through their respective gateways 708. A Monitor 710, 
including a Real Time Parser (RTP) 712 and an Address 
Tracking Module (ATM) 714, is located on Appletalk 
network segment 702 along with other nodes 711. A 
Management Workstation 716 is located on segment 704. It 
is assumed that Monitor 7i0 has the features and 
capabilities previously described; therefore, those 
features not specifically related to the dynamic node 
addressing capability will not be repeated here but 
rather the reader is referred to the earlier discussion, 
suffice it to say that Monitor 710 is, of course, adapted 
to operate on Appletalk Network segment 702, to parse and 
analyze the packets which are transmitted over that 
segment according to the Appletalk* family of protocols 
and to communicate the information which it extracts from 
30 the network to Management Workstation 716 located on 
segment 704. 

Within Monitor 710, ATM 714 maintains a name table 
data structure 730 such as is shown in Fig. 37. Name 
Table 720 includes records 722, each of which has a node 
35 name field 724, a node address field 726, an IP address 
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field 728, and a time field 729. ATM 714 uses Name Table 
720 to keep track of the mappings of node names to node 
address and to IP address* The relevance of each of the 
fields of records 722 in Name Table 72 0 are explained in 
5 the following description of how ATM 714 operates. 

In general, Monitor 710 operates as previously 
described. That is, it passively monitors all packet 
traffic over segment 702 and sends all packets to RTP 712 
for parsing. When RTP 712 recognizes an Appletalk 

10 packet, it transfers control to ATM 714 which analyzes 
the packet for the presence of address mapping 
information. 

The operation of ATM 714 is shown in greater 
detail in the flow diagram of Fig. 38. When ATM 714 

15 receives control from RTP 712, it takes the packet (step 
730 and strips off the lower layers of the protocol until 
it determines whether there is a Name Binding Protocol 
message inside the packet (step 732). If it is a NBP 
message, ATM 714 then determines whether it is new name 

20 Lookup message (step 734). If it is a new name Lookup 
message, ATM 714 extracts the name from the message 
(i.e., the name for which a node address is being 
requested) and adds the name to the node name field 724 
of a record 722 in Name Table 720 (step 736) • 

25 If the message is an NBP message but it is not a 

Lookup message, ATM 714 determines whether it is a Lookup 
Reply (step 738) . If it is a Lookup Reply, signifying 
that it contains a node name/ node address binding, ATM 
714 extracts the name and the assigned node address from 

3 0 the message and adds this information to Name Table 720. 
ATM 714 does this by searching the name fields of records 
722 in Name Table 720 until it locates the name.' Then, 
it updates the node address field of the identified 
record to contain the node address which was extracted 

35 from the received NBP packet. ATM 714 also updates time 
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field 729 to record the time at which the message was 
processed % 

After ATM 714 has updated the address field of the 
appropriate record, it determines whether any records 722 
5 in Name Table 720 should be aged out (step 742). ATM 714 
compares the current time to the times recorded in the 
time fields. If the elapsed time is greater than a 
preselected time period (e.g. 48 hours), ATM 714 clears 
the record of all information (step 744) . After that, it 
10 awaits the next packet from RTP 712. 

As ATM 714 is processing each a packet and it 
determines either that it does not contain an NBP message 
(step 732) or it does not contain a Lookup Reply message 
(step 738) , ATM 714 branches to step 742 to perform the 
15 age out check before going on to the next packet from RTP 
712. 

The Appletalk to IP gateways provide services that 
allow an Appletalk Node to dynamically connect to an IP 
address for communicating with IP nodes. This service 

20 extends the dynamic node address mechanism to the IP 
world for all Appletalk nodes. While the flexibility 
provided is helpful to the users, the network manager is 
faced with the problem of not knowing which Appletalk 
Nodes are currently using a particular IP address and 

25 thus, they can not easily track down problems created by 
the particular node. 

ATM 714 can use passive monitoring of the IP 
address assignment mechanisms to provide the network 
manager a Name-to-IP address mapping. 

3 0 If ATM 714 is also keeping IP address information, 

it implements the additional steps shown in Fig. 39 after 
completing the node name to node address mapping steps. 
ATM 714 again checks whether it is an NBP message (step 
748) . If it is an NBP message, ATM 714 checks whether it 

35 is a response to an IP address request (step 750) . IP 
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address requests are typically implied by an NBP Lookup 
request for an IP gateway. The gateway responds by 
supplying the gateway address as well as an IP address 
that is assigned to the requesting node. If the NBP 
5 message is an IP address response, ATM 714 looks up the 
requesting node in Name Table 720 (step 752) and stores 
the IP address assignment in the IP address field of the 
appropriate record 722 (step 754) . 

After storing the IP address assignment 
10 information, ATM 714 locates all other records 722 in 

Name Table 720 which contain that IP address. Since the 
IP address has been assigned to a new node name, those 
old entries are no longer valid and must be eliminated. 
Therefore, ATM 714 purges the IP address fields of those 
15 records (step 756) . After doing this cleanup step, ATM 
714 returns control to RTP 712. 

Other embodiments are within the following claims. 
For example, the Network Monitor can be adapted to 
identify node types by analyzing the type of packet 
20 traffic to or from the node. if the node being monitored 
is receiving mount requests, the Monitor would report 
that the node is behaving like node a file server. If 
the node is issuing routing requests, the Monitor would 
report that the node is behaving like a router, in 
25 either case, the network manager can check a table of 
what nodes are permitted to provide what functions to 
determine whether the node is authorized to function as 
either a file server or a router, and if not, can take 
appropriate action to correct the problem. 
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APPENDIX I 



SNMP MIB Subset Supported 

This is the subset of the standard MIB which - 
obtained by monitoring. 

Refer to RFC 1066 Management Information Base 
explanation on the items which follow. 

System group: 
none 

Interfaces group 
ifType 

ifPhysAddress 

ifOper Status 

iflnOctets 

iflnUcastPkts 

iflnNUcastPkts 

ifoutoctets 

ifOutUcastPkts 

ifOutNUcastPkts 

Address Translation group 
none 

IP group 

ipForwarding 

ipDefaultTTL 

ipInReceives 

ipInHdrErrors 

ipInAddr Errors 

ipForwDatagrams 

ipReasmReqds 

ipFragCreates 

IP Address Table 

ipAddress 

ipAdEntBcastAddr 

IP Routing Table 
none 

ICMP group 

icmpInMsgs 

icmplnErrors 

icmpInDestUnreachs 

icmplnTimeExcds 

icmpInParmProbs 

icmpInSrcQuenchs 

icmpInRedirects 

icmplnEchoes 
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